Axon: A Synthesizing Superoptimizer for Tensor Programs

2026-06-24 · Source: Computation and Language · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Software Development & Engineering, Mathematics & Computational Sciences · Depth: Expert, quick

Summary

Axon, a synthesizing superoptimizer for tensor programs, addresses the significant challenge of writing high-performance kernels for AI accelerators. Published on 2026-06-24, Axon automates the generation of target instructions from semantic specifications using program synthesis. It empirically selects the best performing kernel by exploring semantically equivalent program variants. The system discovers algebraic transformations by propagating operators through computation graphs and employs SMT over unbounded tensors to guarantee semantic preservation without requiring hand-crafted rewrite rules. Axon further lowers tensor operations to target ISA instructions, explores tiling configurations based on hardware descriptions, and fuses operators and instructions to minimize memory traffic, specifically focusing on tile-based AI accelerator programs.

Key takeaway

For AI Hardware Engineers tasked with optimizing low-level performance for AI accelerators, Axon presents a significant shift. You should evaluate superoptimizers like Axon to automate kernel generation and reduce the manual burden of tiling, instruction selection, and operator fusion. This approach promises to accelerate development cycles and improve kernel efficiency by empirically selecting optimal configurations, freeing your team from complex, error-prone manual optimization.

Key insights

Axon automates high-performance AI accelerator kernel generation using synthesis, SMT, and empirical optimization to reduce programmer burden.

Principles

Program synthesis can automate kernel generation.
SMT ensures semantic preservation in transformations.
Empirical selection optimizes kernel performance.

Method

Axon synthesizes instructions from semantics, discovers algebraic transformations via graph propagation, uses SMT for verification, lowers operations to ISA, then explores tiling and fuses operators to minimize memory traffic.

In practice

Automate AI accelerator kernel development.
Optimize tile-based tensor programs.
Reduce manual kernel programming effort.

Topics

Axon
Superoptimization
Tensor Programs
AI Accelerators
Program Synthesis
Kernel Optimization

Best for: AI Engineer, Research Scientist, AI Scientist, Machine Learning Engineer, AI Hardware Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Computation and Language.