AscendKernelGen: A Systematic Study of LLM-Based Kernel Generation for Neural Processing Units
Summary
AscendKernelGen is a novel framework designed to automate the generation of high-performance compute kernels for Neural Processing Units (NPUs), specifically Huawei's Ascend platform. Recognizing that general-purpose Large Language Models (LLMs) fail to generate functional NPU kernels due to strict hardware constraints and scarce training data, AscendKernelGen introduces a generation-evaluation integrated approach. It comprises Ascend-CoT, a high-quality dataset incorporating chain-of-thought reasoning from real-world kernel implementations, and KernelGen-LM, a domain-adaptive LLM fine-tuned with supervised learning and reinforcement learning using execution feedback. The framework also includes NPUKernelBench, a comprehensive benchmark for evaluating compilation, correctness, and performance across varying complexity levels. Experimental results show a significant improvement, with compilation success on complex Level-2 kernels increasing from 0% to 95.5% (Pass@10) and functional correctness reaching 64.3% from a baseline of complete failure.
Key takeaway
For Research Scientists developing high-performance kernels for specialized hardware like NPUs, AscendKernelGen demonstrates that domain-adaptive LLM training is essential. You should prioritize creating high-quality, reasoning-rich datasets and integrate a robust, hardware-grounded evaluation framework to achieve significant improvements in compilation success and functional correctness, moving beyond general-purpose LLM limitations.
Key insights
Domain-specific reasoning and rigorous evaluation are crucial for LLM-based NPU kernel generation.
Principles
- Hardware-specific code generation requires deep domain adaptation.
- Evaluation must include compilation, correctness, and performance.
- Error-derived supervision improves model robustness.
Method
AscendKernelGen uses a two-stage training strategy: supervised fine-tuning with error-derived supervision for foundational knowledge, followed by reinforcement learning with execution-based preference signals for fine-grained optimization.
In practice
- Use Ascend-CoT for NPU kernel development.
- Employ NPUKernelBench for comprehensive kernel evaluation.
- Implement error-derived SFT to reduce numerical failures.
Topics
- AscendKernelGen
- Neural Processing Units
- LLM-based Kernel Generation
- Domain-Specific Languages
- Ascend-CoT Dataset
Best for: Research Scientist, AI Scientist, Machine Learning Engineer, AI Hardware Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.