ATOM: Unlocking Extreme AMD Instinct Inference with Software-Hardware Co-Optimization

· Source: AMD ROCm Blogs · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Software Development & Engineering, Cloud Computing & IT Infrastructure · Depth: Advanced, medium

Summary

ATOM (AiTer Optimized Model) is an inference engine designed to maximize efficiency for LLM serving on AMD Instinct™ GPUs, addressing challenges like high concurrency and multi-GPU deployment. It operates as the system-level inference engine within the AMD AI stack, orchestrating execution while leveraging AITER for kernel acceleration and MoRI for distributed communication. ATOM supports standalone serving with OpenAI-compatible APIs and integrates with vLLM and SGLang ecosystems. Its architecture coordinates scheduling, KV cache management, and various parallelism strategies (TP/DP/EP). Key features include continuous batching, prefix caching, Level 3 compilation, FP8/MXFP4/INT8/INT4 quantization, and MTP speculative decoding. ATOM covers major model families like Llama, Qwen, DeepSeek, and Mixtral, optimizing for Dense, MoE, and inference-enhanced workloads. A public benchmark dashboard and official recipes aid deployment and tuning.

Key takeaway

For AI Engineers deploying LLMs on AMD Instinct GPUs, ATOM offers a unified, high-performance inference engine. You should utilize ATOM directly for its optimized execution across Dense, MoE, and MTP-enabled models, or use its architecture and recipes as a reference for tuning other frameworks. This approach can stabilize throughput and reduce per-model optimization overhead, ensuring extreme performance.

Key insights

ATOM is a ROCm-first, co-optimized inference engine for extreme LLM performance on AMD Instinct GPUs.

Principles

Method

ATOM orchestrates LLM inference by dispatching requests through an LLMEngine to EngineCore's, where a Scheduler manages batching and ModelRunner executes forward passes using optimized AITER kernels and parallelism strategies.

In practice

Topics

Code references

Best for: Machine Learning Engineer, AI Engineer, AI Architect

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by AMD ROCm Blogs.