Decentralized Autoregressive Generation

· Source: cs.AI updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning · Depth: Expert, long

Summary

A theoretical analysis introduces Decentralized Autoregressive Generation, defining the Decentralized Discrete Flow Matching objective where probability generating velocity is a linear combination of expert flows. This work extends Discrete Flow Matching to the discrete time domain and demonstrates that autoregressive generation is a special degenerate case. Experiments validate the equivalence between decentralized and centralized training for multimodal language models (MLLMs) using LLaVA-1.5-7B and InternVL 2.5-1B. Results show expert ensembles achieve near-parity with compute-matched dense baselines. For LLaVA, VQAv2 improved by +1.49, while MME dropped by -33.32. InternVL saw visual grounding improvements (RefCOCO +6-8 points) but a significant MME drop (-55.92). Ablation studies explored the impact of expert count, vision encoder choice, and clustering algorithms on performance.

Key takeaway

For AI Architects or Machine Learning Engineers facing MLLM training scalability or infrastructure limitations, decentralized autoregressive generation provides a robust solution. You can achieve performance comparable to centralized training, as demonstrated with LLaVA and InternVL, while mitigating single-node failure risks. Consider implementing expert-based training with intelligent data partitioning to reduce communication overhead and enable more accessible, collaborative model development.

Key insights

Decentralized autoregressive generation, using expert flows, achieves performance parity with centralized MLLM training.

Principles

Method

The approach extends Discrete Flow Matching to discrete time, defining probability generating velocity as a linear combination of expert flows. It partitions image-text datasets using spherical balanced k-means on CLIP features, training experts independently. Inference uses a top-k routing strategy based on cosine similarity.

In practice

Topics

Best for: Research Scientist, AI Scientist, Machine Learning Engineer, AI Architect

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.