MLPerf Training v6.0: Lambda delivers fastest LLM training on NVIDIA GB300 NVL72 and fastest MoE training on NVIDIA HGX B200

· Source: The Lambda Deep Learning Blog · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Cloud Computing & IT Infrastructure · Depth: Advanced, short

Summary

Lambda achieved leading performance in the MLPerf Training v6.0 benchmarks, specifically for large language model (LLM) and mixture-of-experts (MoE) training. On the NVIDIA GB300 NVL72 system, Lambda's Llama 3.1 8B training completed in 11.59 minutes, an 18.7% improvement over its previous v5.1 result and the fastest among GB300 NVL72 submissions. For MoE models, Lambda recorded the fastest single-node HGX B200 result for GPT-OSS-20B at 96.46 minutes, and a competitive 18.35 minutes on GB300 NVL72. MLPerf Training v6.0 notably expanded its suite to include MoE models like OpenAI's GPT-OSS-20B, reflecting real-world AI training workloads. These gains are attributed to NVIDIA Blackwell Ultra innovations, including 279GB HBM3e per GPU, enhanced memory bandwidth, and improved interconnects, combined with Lambda's cluster design and optimized NVIDIA NeMo software stack.

Key takeaway

For AI Architects evaluating infrastructure for large-scale LLM or MoE training, Lambda's MLPerf v6.0 results demonstrate validated performance. Your teams can achieve significant time savings, potentially days or weeks, by leveraging NVIDIA GB300 NVL72 systems with optimized software like NVIDIA NeMo. Consider Lambda's 1-Click Clusters for benchmarking new architectures or scaling production runs, ensuring your hardware decisions are based on reproducible, industry-standard conditions.

Key insights

Optimized hardware and software stacks significantly accelerate LLM and MoE model training convergence.

Principles

In practice

Topics

Best for: MLOps Engineer, AI Engineer, NLP Engineer, Machine Learning Engineer, AI Architect, Director of AI/ML

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by The Lambda Deep Learning Blog.