Speeding up GPU kernels by 38% with a multi-agent system

· Source: Cursor Blog · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Robotics & Autonomous Systems, Software Development & Engineering · Depth: Expert, medium

Summary

A multi-agent system autonomously optimized 235 CUDA kernels for NVIDIA Blackwell 200 GPUs, achieving a 38% geometric mean speedup over baselines in just three weeks. This system, developed in collaboration with NVIDIA, addressed complex kernel optimization problems that typically require months or years of work from highly experienced engineers. The multi-agent harness operated autonomously, building and optimizing kernels down to the assembly level. It utilized NVIDIA's SOL-ExecBench to generate real-world optimization problems from over 124 production open-source models like Deepseek and Gemma, and to benchmark solutions on 27 Blackwell 200 GPUs. The system successfully outperformed baselines on 149 out of 235 problems (63%), with 19% of optimizations exceeding 2x improvements, demonstrating its capability to explore a broader solution space beyond manual simplifications.

Key takeaway

For AI Engineers and MLOps professionals focused on GPU performance, this multi-agent system demonstrates a significant shift in kernel optimization. You should consider exploring multi-agent architectures for tackling long-tail optimization problems that are impractical with traditional manual approaches, potentially reducing latency and cost per token for your AI model training and inference workloads on NVIDIA GPUs.

Key insights

Multi-agent systems can autonomously optimize complex software, achieving significant performance gains in weeks.

Principles

Method

A planner agent distributes and rebalances work across autonomous workers based on performance metrics, continuously testing, debugging, and optimizing kernels without developer intervention, using a single markdown file for coordination.

In practice

Topics

Code references

Best for: MLOps Engineer, AI Engineer, NLP Engineer, AI Scientist, Machine Learning Engineer, AI Hardware Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Cursor Blog.