Online Dynamic Batching with Formal Guarantees for LLM Training
Summary
Online Dynamic Batching (ODB) is a new DataLoader-side system designed to optimize large language model (LLM) training by addressing the challenge of unknown sample costs before preprocessing. Traditional batch samplers assume fixed costs, but ODB moves batch formation to a point where true training cost, including preprocessing, augmentation, templating, tokenization, and multimodal visual-token expansion, is observable. This drop-in system preserves Distributed Data Parallel (DDP) step alignment, formalized by the Distributed Group Alignment Problem (DGAP), with proven deadlock-free bounded termination. ODB requires no changes to models, optimizers, or attention kernels. Benchmarks on 2B/8B Qwen3-VL models across UltraChat, LLaVA, and ShareGPT4o datasets show ODB improves emitted-sample throughput by 1.58-2.51x on single-node Full FT/LoRA and 1.71-3.78x on two-node Full FT, maintaining comparable quality. Production MM-Mix runs achieved 4.43x throughput. ODB also performs within 15% of GMT/BMT offline token-budget oracles on UltraChat/LLaVA and is faster on high-CV ShareGPT4o, reaching 2.24-2.39x single-node and 3.06-3.69x two-node gains.
Key takeaway
For MLOps Engineers or AI Scientists optimizing LLM fine-tuning, you should consider integrating Online Dynamic Batching (ODB) to significantly boost training throughput. This system allows you to achieve 1.58-4.43x faster sample processing on diverse datasets like UltraChat/LLaVA/ShareGPT4o and MM-Mix, without compromising model quality or requiring complex kernel rewrites. Implement ODB to reduce training times and improve resource utilization, especially for multimodal or highly heterogeneous data.
Key insights
Online Dynamic Batching (ODB) optimizes LLM training by forming batches after observing true sample costs, ensuring DDP alignment.
Principles
- Batch formation should occur post-preprocessing for accurate cost observability.
- Formal guarantees for distributed synchronization are crucial for online batching.
- Throughput gains are achievable without model or kernel modifications.
Method
ODB is a DataLoader-side drop-in system that moves batch formation to the point of accurate cost observability, preserving DDP step alignment and offering formal guarantees.
In practice
- Integrate "online-dynamic-batching" for LLM fine-tuning.
- Apply ODB to high-heterogeneity multimodal datasets.
- Use ODB to improve throughput on 2B/8B Qwen3-VL models.
Topics
- Online Dynamic Batching
- LLM Training Optimization
- Distributed Data Parallel
- Multimodal LLMs
- Throughput Enhancement
- Formal Guarantees
Best for: AI Engineer, NLP Engineer, Computer Vision Engineer, AI Scientist, Machine Learning Engineer, MLOps Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Machine Learning.