YuanLab AI Releases Yuan 3.0 Ultra: A Flagship Multimodal MoE Foundation Model, Built for Stronger Intelligence and Unrivaled Efficiency

2026-03-05 · Source: Machine Learning ML & Generative AI News · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Emerging Technologies & Innovation · Depth: Advanced, quick

Summary

YuanLab AI has released Yuan 3.0 Ultra, a trillion-parameter open-source multimodal Mixture-of-Experts (MoE) foundation model. This model achieves a 33.3% reduction in total parameters, from 1.5T to 1T, and a 49% increase in pre-training efficiency. These improvements are attributed to its novel Layer-Adaptive Expert Pruning (LAEP) algorithm, which prunes underutilized experts during pre-training. An Expert Rearranging algorithm further minimizes device-level token variance, leading to a high computational throughput of 92.6 TFLOPS per GPU. Yuan 3.0 Ultra also incorporates a Reflection Inhibition Reward Mechanism (RIRM) to promote concise reasoning, achieving leading accuracy on enterprise benchmarks like Docmatix (67.4%), ChatRAG (68.2%), and SummEval (62.8%).

Key takeaway

For AI Architects evaluating large-scale foundation models, Yuan 3.0 Ultra presents a compelling option due to its significant parameter reduction and enhanced pre-training efficiency. Its novel expert pruning and rearranging algorithms, combined with the RIRM, offer a path to more performant and concise reasoning models. Consider its benchmark performance and open-source nature for your next multimodal MoE deployment.

Key insights

Yuan 3.0 Ultra is a trillion-parameter MoE model optimized for efficiency and concise reasoning.

Principles

Prune underutilized experts for efficiency.
Minimize token variance for throughput.

Method

The model uses Layer-Adaptive Expert Pruning (LAEP) and Expert Rearranging algorithms for efficiency, alongside a Reflection Inhibition Reward Mechanism (RIRM) for concise reasoning.

In practice

Achieves 92.6 TFLOPS per GPU.
Reduces parameters by 33.3%.

Topics

Yuan 3.0 Ultra
Mixture-of-Experts
Model Efficiency
Multimodal Foundation Model
Expert Pruning

Code references

Yuan-lab-LLM/Yuan3.0-Ultra

Best for: AI Architect, NLP Engineer, Computer Vision Engineer, AI Engineer, Machine Learning Engineer, AI Researcher

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Machine Learning ML & Generative AI News.