HTAM: Hierarchical Transition-Attended Memory for Operator Optimization

2026-05-28 · Source: Computation and Language · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Software Development & Engineering · Depth: Expert, quick

Summary

HTAM (Hierarchical Transition-Attended Memory) is a novel coarse-to-fine framework designed for LLM-based operator optimization, addressing the challenge of efficiently optimizing high-performance GPU kernels crucial for LLM deployment. Traditional methods struggle with a granularity mismatch, where reusable coarse hints are difficult to execute, and detailed memories expand the search space. HTAM tackles this by constructing a two-level Hierarchical Transition Graph (HTG) that organizes coarse global optimization directions, detailed local strategies, and the transition experience between optimization steps. During each evolution, HTAM selects a global direction based on the current state and history, retrieves relevant local strategy memory, and uses it to guide concrete CUDA code generation. Experiments on the full KernelBench suite demonstrate HTAM's consistent improvements in correctness, fast-solution rate, and speedup compared to existing LLM-based baselines, with backend and Robust-KBench studies confirming transferable benefits from its structured memory approach.

Key takeaway

For AI Engineers tasked with optimizing GPU kernels for large language model deployment, HTAM presents a significant advancement. If you are struggling with the granularity mismatch in existing LLM-based code generation for operator optimization, you should consider exploring hierarchical memory frameworks like HTAM. This approach can consistently improve the correctness, solution rate, and speedup of your generated CUDA code, reducing the reliance on specialized manual expertise and offering transferable benefits across different backends.

Key insights

HTAM leverages hierarchical memory and transition graphs to enhance LLM-based GPU operator optimization, improving correctness and speedup.

Principles

Optimize experience hierarchically.
Structure memory for LLM guidance.
Transition graphs guide optimization steps.

Method

HTAM constructs a two-level Hierarchical Transition Graph (HTG) to organize global directions and local strategies. It selects a global direction, retrieves corresponding local strategy memory, then guides CUDA code generation.

In practice

Implement HTAM for GPU kernel optimization.
Structure LLM guidance with HTG.
Enhance LLM-generated CUDA code.

Topics

GPU Kernel Optimization
LLM Code Generation
Hierarchical Memory
Operator Optimization
CUDA Programming
KernelBench

Best for: Research Scientist, Machine Learning Engineer, AI Engineer, AI Scientist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Computation and Language.