Minibatch Selection via Partition Matroid Constrained Gradient Matching
Summary
A new minibatch selection method, PartitionSel, is introduced for fine-tuning Large Language Models (LLMs) on heterogeneous datasets. This approach addresses the challenge of balancing rapid convergence with comprehensive data domain coverage, a limitation of existing independent or computationally expensive proxy-model methods. PartitionSel maximizes a validation-guided gradient-matching utility, incorporating per-domain budgets through a partition-matroid constraint. This design couples selections across domains, aiming to reduce redundancy and ensure more compatible training updates. The method's objective is weakly submodular, enabling an orthogonal matching pursuit algorithm with provable approximation guarantees. Empirical evaluations demonstrated PartitionSel's robust gains over per-domain and domain-agnostic baselines when fine-tuning Qwen2.5 and Llama-3 on MetaMathQA and Mol-Instructions benchmarks. It also effectively reduced conflicting gradient pairs within batches.
Key takeaway
For Machine Learning Engineers fine-tuning Large Language Models on heterogeneous data, consider implementing PartitionSel to optimize minibatch selection. This method demonstrably improves convergence and domain coverage compared to simpler baselines, reducing conflicting gradient pairs within batches. Your training updates will be more compatible, leading to robust performance gains on benchmarks like MetaMathQA and Mol-Instructions. Evaluate its integration to enhance your LLM fine-tuning workflows.
Key insights
PartitionSel optimizes LLM fine-tuning on heterogeneous data by balancing domain coverage and convergence via gradient-matching and matroid constraints.
Principles
- Gradient matching improves minibatch selection.
- Partition matroids constrain per-domain budgets.
- Cross-domain coupling reduces selection redundancy.
Method
PartitionSel maximizes a validation-guided gradient-matching utility under per-domain budgets, encoded as a partition-matroid constraint. It uses an orthogonal matching pursuit algorithm for optimization, ensuring cross-domain coupling and reduced redundancy.
In practice
- Fine-tune LLMs on diverse datasets.
- Improve training update compatibility.
- Outperform domain-agnostic baselines.
Topics
- Minibatch Selection
- Large Language Models
- Gradient Matching
- Partition Matroids
- LLM Fine-tuning
- Heterogeneous Data
Best for: AI Engineer, Research Scientist, AI Scientist, Machine Learning Engineer, NLP Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Machine Learning.