MixAtlas: Uncertainty-aware Data Mixture Optimization for Multimodal LLM Midtraining

2026-04-16 · Source: Apple Machine Learning Research · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics · Depth: Expert, quick

Summary

MixAtlas is a new framework designed for compute-efficient multimodal mixture optimization in large language model (LLM) midtraining, accepted at NADPFM ICLR 2026. It addresses the underexplored area of data-mixture optimization for multimodal pretraining by systematically decomposing training data along "image concepts" and "task supervision" axes. This approach enables interpretable mixture control and fine-grained performance attribution. MixAtlas utilizes small proxy models and a Gaussian-process surrogate to explore the mixture space at 1/100th the cost of full-scale training. The optimized data mixtures lead to significant improvements, including up to 3x faster convergence and consistent 2-5% gains across various benchmarks, with notable boosts on text-rich tasks like ChartQA (+10%) and TextVQA (+13%). The mixtures derived from proxy models successfully transfer to larger models, preserving both efficiency and accuracy.

Key takeaway

For AI Engineers and Research Scientists developing multimodal LLMs, MixAtlas offers a practical, compute-efficient recipe for optimizing data mixtures. You should consider adopting its systematic domain decomposition and proxy model approach to achieve faster convergence and substantial performance gains, especially on text-rich benchmarks, without incurring the high costs of full-scale mixture tuning.

Key insights

MixAtlas optimizes multimodal LLM data mixtures using proxy models and interpretable domain decomposition for efficiency and performance.

Principles

Systematic domain decomposition improves mixture control.
Smaller proxy models can predict optimal mixtures.
Interpretable axes aid performance attribution.

Method

MixAtlas factorizes training data by image concepts and task supervision, then uses small proxy models with a Gaussian-process surrogate to explore the mixture space and identify optimal data proportions.

In practice

Factor data along interpretable axes.
Use proxy models for cost-effective optimization.
Target text-rich benchmarks for significant gains.

Topics

MixAtlas
Multimodal LLMs
Data Mixture Optimization
Proxy Models
Gaussian Process Surrogate

Best for: AI Engineer, Research Scientist, AI Scientist, Machine Learning Engineer, AI Architect

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Apple Machine Learning Research.