Poster
KernelBench: Can LLMs Write Efficient GPU Kernels?
Anne Ouyang · Simon Guo · Simran Arora · Alex Zhang · William Hu · Christopher Re · Azalia Mirhoseini
East Exhibition Hall A-B #E-2010
Modern AI systems demand massive computational power, delivered by dedicated AI hardware such as GPUs. To effectively utilize this hardware, engineers need to write specialized programs called GPU kernels, but the process of developing kernels is extremely difficult and time-consuming due to the deep domain knowledge required.We investigate whether language models could help automatically generate these complex GPU kernels. To test this, we created KernelBench, a comprehensive benchmark of 250 real-world AI workloads that could be accelerated using performant kernels. We found that today’s models struggle significantly with this task, with the best models only matching PyTorch's performance in less than 20% of cases. Just as human expert engineers iteratively refine their code over time, we found that leveraging execution feedback could also help AI improve its generated kernels; however, improvement remains limited and writing efficient GPU kernels still poses a challenge for current AI systems.Progress on KernelBench directly translates to faster, more efficient kernels that could reduce energy consumption and accelerate AI development. Additionally, KernelBench serves as a research environment for improving language models on this challenging and performance-critical code generation task.