Reproducing AMD MLPerf Training v6.0 Submission Result

· Source: AMD ROCm Blogs · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Cloud Computing & IT Infrastructure · Depth: Intermediate, long

Summary

AMD has released a step-by-step guide for reproducing its MLPerf Training v6.0 submission results, achieved on AMD Instinct MI325X, MI350X, and MI355X GPUs. The guide covers three benchmarks: Llama 2 70B LoRA fine-tuning, Llama 3.1 8B pretraining, and Flux.1-schnell text-to-image training. For LLM benchmarks, AMD's Primus training framework was used, abstracting Megatron-LM and TorchTitan. Reproduction requires ROCm 7.2.2 or later, Docker, and specific disk space, such as 6 TB for Flux.1-schnell. The article details environment setup, dataset preparation, training configuration, execution, and result validation for each benchmark. Expected scores include Llama 2 70B LoRA at 8.27 minutes on MI355X and 10.25 minutes on MI350X, Llama 3.1 8B pretraining at 86.84 minutes on MI355X and 109.76 minutes on MI350X, and Flux.1-schnell at 92.36 minutes on an 8-node MI325X configuration.

Key takeaway

For AI Engineers evaluating AMD Instinct GPUs for large-scale model training, this guide offers a clear path to validate AMD's MLPerf Training v6.0 performance claims. You should follow the detailed steps for environment setup and dataset preparation to reproduce Llama 2 70B LoRA, Llama 3.1 8B, or Flux.1-schnell benchmarks. Your team can utilize the Primus framework for streamlined LLM workflows and ensure MLPerf-compliant result validation by averaging 8 of 10 runs.

Key insights

AMD provides a detailed guide to reproduce its MLPerf Training v6.0 benchmark results on Instinct GPUs using the Primus framework.

Principles

Method

The reproduction method involves setting up a Docker environment with ROCm 7.2.2+, preparing specific datasets, configuring platform-specific hyperparameters, and executing training runs, followed by MLPerf-compliant result validation.

In practice

Topics

Code references

Best for: Machine Learning Engineer, AI Engineer, AI Hardware Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by AMD ROCm Blogs.