Online Dynamic Batching with Formal Guarantees for LLM Training

2026-06-18 · Source: Machine Learning · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Software Development & Engineering · Depth: Expert, quick

Summary

Online Dynamic Batching (ODB) is a new DataLoader-side system designed to optimize large language model (LLM) training by addressing the challenge of unknown sample costs before preprocessing. Traditional batch samplers assume fixed costs, but ODB moves batch formation to a point where true training cost, including preprocessing, augmentation, templating, tokenization, and multimodal visual-token expansion, is observable. This drop-in system preserves Distributed Data Parallel (DDP) step alignment, formalized by the Distributed Group Alignment Problem (DGAP), with proven deadlock-free bounded termination. ODB requires no changes to models, optimizers, or attention kernels. Benchmarks on 2B/8B Qwen3-VL models across UltraChat, LLaVA, and ShareGPT4o datasets show ODB improves emitted-sample throughput by 1.58-2.51x on single-node Full FT/LoRA and 1.71-3.78x on two-node Full FT, maintaining comparable quality. Production MM-Mix runs achieved 4.43x throughput. ODB also performs within 15% of GMT/BMT offline token-budget oracles on UltraChat/LLaVA and is faster on high-CV ShareGPT4o, reaching 2.24-2.39x single-node and 3.06-3.69x two-node gains.

Key takeaway

For MLOps Engineers or AI Scientists optimizing LLM fine-tuning, you should consider integrating Online Dynamic Batching (ODB) to significantly boost training throughput. This system allows you to achieve 1.58-4.43x faster sample processing on diverse datasets like UltraChat/LLaVA/ShareGPT4o and MM-Mix, without compromising model quality or requiring complex kernel rewrites. Implement ODB to reduce training times and improve resource utilization, especially for multimodal or highly heterogeneous data.

Key insights

Online Dynamic Batching (ODB) optimizes LLM training by forming batches after observing true sample costs, ensuring DDP alignment.

Principles

Batch formation should occur post-preprocessing for accurate cost observability.
Formal guarantees for distributed synchronization are crucial for online batching.
Throughput gains are achievable without model or kernel modifications.

Method

ODB is a DataLoader-side drop-in system that moves batch formation to the point of accurate cost observability, preserving DDP step alignment and offering formal guarantees.

In practice

Integrate "online-dynamic-batching" for LLM fine-tuning.
Apply ODB to high-heterogeneity multimodal datasets.
Use ODB to improve throughput on 2B/8B Qwen3-VL models.

Topics

Online Dynamic Batching
LLM Training Optimization
Distributed Data Parallel
Multimodal LLMs
Throughput Enhancement
Formal Guarantees

Best for: AI Engineer, NLP Engineer, Computer Vision Engineer, AI Scientist, Machine Learning Engineer, MLOps Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Machine Learning.