Skip to content
X · @togethercompute · X / Twitter

LLMs are getting better at writing GPU kernels. Multi-GPU kernels are the harder test. At @aiDotEngineer World's Fair, @simran_s_arora will share Para…

LLMs are getting better at writing GPU kernels. Multi-GPU kernels are the harder test.At @aiDotEngineer World's Fair, @simran_s_arora will share ParallelKernelBench, an open-source benchmark built from real CUDA communication problems where performance depends on moving data efficiently over NVLink.Day 2, June 30, 12:0