X · @togethercompute
· X / Twitter
LLMs are getting better at writing GPU kernels. Multi-GPU kernels are the harder test. At @aiDotEngineer World's Fair, @simran_s_arora will share Para…
LLMs are getting better at writing GPU kernels. Multi-GPU kernels are the harder test.At @aiDotEngineer World's Fair, @simran_s_arora will share ParallelKernelBench, an open-source benchmark built from real CUDA communication problems where performance depends on moving data efficiently over NVLink.Day 2, June 30, 12:0