Reproducing AMD MLPerf Training v6.0 Submission Result

2026-06-16 · Source: AMD ROCm Blogs · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Cloud Computing & IT Infrastructure · Depth: Intermediate, long

Summary

AMD has released a step-by-step guide for reproducing its MLPerf Training v6.0 submission results, achieved on AMD Instinct MI325X, MI350X, and MI355X GPUs. The guide covers three benchmarks: Llama 2 70B LoRA fine-tuning, Llama 3.1 8B pretraining, and Flux.1-schnell text-to-image training. For LLM benchmarks, AMD's Primus training framework was used, abstracting Megatron-LM and TorchTitan. Reproduction requires ROCm 7.2.2 or later, Docker, and specific disk space, such as 6 TB for Flux.1-schnell. The article details environment setup, dataset preparation, training configuration, execution, and result validation for each benchmark. Expected scores include Llama 2 70B LoRA at 8.27 minutes on MI355X and 10.25 minutes on MI350X, Llama 3.1 8B pretraining at 86.84 minutes on MI355X and 109.76 minutes on MI350X, and Flux.1-schnell at 92.36 minutes on an 8-node MI325X configuration.

Key takeaway

For AI Engineers evaluating AMD Instinct GPUs for large-scale model training, this guide offers a clear path to validate AMD's MLPerf Training v6.0 performance claims. You should follow the detailed steps for environment setup and dataset preparation to reproduce Llama 2 70B LoRA, Llama 3.1 8B, or Flux.1-schnell benchmarks. Your team can utilize the Primus framework for streamlined LLM workflows and ensure MLPerf-compliant result validation by averaging 8 of 10 runs.

Key insights

AMD provides a detailed guide to reproduce its MLPerf Training v6.0 benchmark results on Instinct GPUs using the Primus framework.

Principles

MLPerf results require rigorous, multi-run validation.
Unified frameworks simplify large-scale model training.
Hardware-specific configurations optimize performance.

Method

The reproduction method involves setting up a Docker environment with ROCm 7.2.2+, preparing specific datasets, configuring platform-specific hyperparameters, and executing training runs, followed by MLPerf-compliant result validation.

In practice

Use primus-cli for LLM pretraining/fine-tuning.
Download pre-tokenized datasets for efficiency.
Average 8 of 10 runs for MLPerf scores.

Topics

MLPerf Training v6.0
AMD Instinct GPUs
Primus Training Framework
LLM Training
Text-to-Image Models
Benchmark Reproduction

Code references

Best for: Machine Learning Engineer, AI Engineer, AI Hardware Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by AMD ROCm Blogs.