AscendKernelGen: A Systematic Study of LLM-Based Kernel Generation for Neural Processing Units

· Source: cs.AI updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Software Development & Engineering · Depth: Expert, extended

Summary

AscendKernelGen is a novel framework designed to automate the generation of high-performance compute kernels for Neural Processing Units (NPUs), specifically Huawei's Ascend platform. Recognizing that general-purpose Large Language Models (LLMs) fail to generate functional NPU kernels due to strict hardware constraints and scarce training data, AscendKernelGen introduces a generation-evaluation integrated approach. It comprises Ascend-CoT, a high-quality dataset incorporating chain-of-thought reasoning from real-world kernel implementations, and KernelGen-LM, a domain-adaptive LLM fine-tuned with supervised learning and reinforcement learning using execution feedback. The framework also includes NPUKernelBench, a comprehensive benchmark for evaluating compilation, correctness, and performance across varying complexity levels. Experimental results show a significant improvement, with compilation success on complex Level-2 kernels increasing from 0% to 95.5% (Pass@10) and functional correctness reaching 64.3% from a baseline of complete failure.

Key takeaway

For Research Scientists developing high-performance kernels for specialized hardware like NPUs, AscendKernelGen demonstrates that domain-adaptive LLM training is essential. You should prioritize creating high-quality, reasoning-rich datasets and integrate a robust, hardware-grounded evaluation framework to achieve significant improvements in compilation success and functional correctness, moving beyond general-purpose LLM limitations.

Key insights

Domain-specific reasoning and rigorous evaluation are crucial for LLM-based NPU kernel generation.

Principles

Method

AscendKernelGen uses a two-stage training strategy: supervised fine-tuning with error-derived supervision for foundational knowledge, followed by reinforcement learning with execution-based preference signals for fine-grained optimization.

In practice

Topics

Best for: Research Scientist, AI Scientist, Machine Learning Engineer, AI Hardware Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.