Reproducing the AMD MLPerf Inference v6.0 Submission Result
Summary
AMD's MLPerf Inference v6.0 submissions, their fourth round, demonstrate the company's capabilities in accelerating complex inference workloads. This guide details how to reproduce AMD's results on various systems, including multi-node configurations using the AMD Instinct MI355X platform. The submissions featured models like Llama 2 70B, gpt-oss-120b, and Wan-2.2-T2V-A14B, utilizing the WMXFP4 and BF16 datatypes. Nine AMD partners also submitted results on Instinct platforms in the "Available" category. Reproduction requires an AMD Instinct MI355X Platform, ROCm 7.1.0 or later, and a supported Linux distribution, involving preparing Docker containers, downloading models and datasets, and running benchmarking scripts for offline, server, and interactive scenarios.
Key takeaway
For AI Engineers evaluating AMD Instinct MI355X platforms for large language model inference, this guide provides essential steps to validate AMD's MLPerf Inference v6.0 performance claims. You should follow the detailed Docker, model download, and benchmarking scripts to independently verify performance and accuracy for Llama 2 70B and gpt-oss-120b across single and multi-node setups, ensuring the platform meets your specific workload requirements.
Key insights
AMD provides a guide to reproduce its MLPerf Inference v6.0 results on Instinct MI355X platforms.
Principles
- Reproducibility is key for benchmark validation.
- Multi-node inference scales performance.
- Quantization (WMXFP4) optimizes large language models.
Method
The reproduction method involves three steps: preparing a Docker container, downloading the reference model and dataset, and executing benchmarking scripts for various scenarios (offline, server, interactive).
In practice
- Use `docker pull` to get specific MLPerf images.
- Clone quantized models from Hugging Face.
- Run `main.py` for performance and accuracy tests.
Topics
- MLPerf Inference v6.0
- AMD Instinct MI355X
- Llama 2 70B
- gpt-oss-120b
- MXFP4 Datatype
Code references
- mlcommons/inference
- amd/Llama-2-70b-chat-hf-WMXFP4-AMXFP4-KVFP8-Scale-UINT8-6.0MLPerf-GPTQ
- amd/gpt-oss-120b-w-mxfp4-a-fp8-Mlperf
Best for: Machine Learning Engineer, AI Engineer, AI Hardware Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by AMD ROCm Blogs.