daVinci-kernel: Co-Evolving Skill Selection, Summarization, and Utilization via RL for GPU Kernel Optimization

· Source: Artificial Intelligence · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Software Development & Engineering, Cloud Computing & IT Infrastructure · Depth: Expert, quick

Summary

daVinci-kernel is a reinforcement learning framework designed for GPU kernel optimization, focusing on execution efficiency while assuming functional correctness. This system co-evolves skill selection, summarization, and utilization through a dynamically evolving skill library. It integrates three agents sharing a single LLM backbone: a Skill Selection Agent that retrieves relevant techniques using BM25 and LLM reranking, a Policy Agent that generates multi-turn CUDA/Triton kernels based on selected skills, and a Skill Summary Agent that distills successful rollouts into reusable skills. Candidate skills are only incorporated after execution-based verification confirms reproducible speedups. The agents are initialized via a structured SFT cold start on diversity-filtered data and jointly optimized end-to-end with multi-turn REINFORCE and per-agent advantage estimation. On KernelBench, daVinci-kernel-14B achieved 37.2%, 70.6%, and 32.2% on Level 1, Level 2, and Level 3 under the Fast$_1$ threshold, surpassing Dr.Kernel-14B.

Key takeaway

For Machine Learning Engineers or AI Hardware Engineers focused on GPU kernel optimization, daVinci-kernel demonstrates a powerful RL approach. You should consider integrating similar co-evolutionary skill learning frameworks to achieve significant execution efficiency gains. This method, which verifies speedups before skill adoption, offers a robust path to automating complex kernel optimizations, potentially surpassing current RL-trained models like Dr.Kernel-14B. Explore its multi-agent, LLM-backed architecture for your next performance-critical projects.

Key insights

A reinforcement learning framework co-evolves skill selection, summarization, and utilization for GPU kernel optimization.

Principles

Method

Initialize three LLM-backed agents via SFT cold start, then jointly optimize with multi-turn REINFORCE, adding skills only after verified speedups.

In practice

Topics

Best for: AI Engineer, Research Scientist, AI Scientist, Machine Learning Engineer, AI Hardware Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.