Reproducing the AMD MLPerf Inference v6.0 Submission Result

2026-04-01 · Source: AMD ROCm Blogs · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Cloud Computing & IT Infrastructure · Depth: Intermediate, long

Summary

AMD's MLPerf Inference v6.0 submissions, their fourth round, demonstrate the company's capabilities in accelerating complex inference workloads. This guide details how to reproduce AMD's results on various systems, including multi-node configurations using the AMD Instinct MI355X platform. The submissions featured models like Llama 2 70B, gpt-oss-120b, and Wan-2.2-T2V-A14B, utilizing the WMXFP4 and BF16 datatypes. Nine AMD partners also submitted results on Instinct platforms in the "Available" category. Reproduction requires an AMD Instinct MI355X Platform, ROCm 7.1.0 or later, and a supported Linux distribution, involving preparing Docker containers, downloading models and datasets, and running benchmarking scripts for offline, server, and interactive scenarios.

Key takeaway

For AI Engineers evaluating AMD Instinct MI355X platforms for large language model inference, this guide provides essential steps to validate AMD's MLPerf Inference v6.0 performance claims. You should follow the detailed Docker, model download, and benchmarking scripts to independently verify performance and accuracy for Llama 2 70B and gpt-oss-120b across single and multi-node setups, ensuring the platform meets your specific workload requirements.

Key insights

AMD provides a guide to reproduce its MLPerf Inference v6.0 results on Instinct MI355X platforms.

Principles

Reproducibility is key for benchmark validation.
Multi-node inference scales performance.
Quantization (WMXFP4) optimizes large language models.

Method

The reproduction method involves three steps: preparing a Docker container, downloading the reference model and dataset, and executing benchmarking scripts for various scenarios (offline, server, interactive).

In practice

Use `docker pull` to get specific MLPerf images.
Clone quantized models from Hugging Face.
Run `main.py` for performance and accuracy tests.

Topics

MLPerf Inference v6.0
AMD Instinct MI355X
Llama 2 70B
gpt-oss-120b
MXFP4 Datatype

Code references

Best for: Machine Learning Engineer, AI Engineer, AI Hardware Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by AMD ROCm Blogs.