Minibatch Selection via Partition Matroid Constrained Gradient Matching

· Source: Machine Learning · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Mathematics & Computational Sciences · Depth: Expert, quick

Summary

A new minibatch selection method, PartitionSel, is introduced for fine-tuning Large Language Models (LLMs) on heterogeneous datasets. This approach addresses the challenge of balancing rapid convergence with comprehensive data domain coverage, a limitation of existing independent or computationally expensive proxy-model methods. PartitionSel maximizes a validation-guided gradient-matching utility, incorporating per-domain budgets through a partition-matroid constraint. This design couples selections across domains, aiming to reduce redundancy and ensure more compatible training updates. The method's objective is weakly submodular, enabling an orthogonal matching pursuit algorithm with provable approximation guarantees. Empirical evaluations demonstrated PartitionSel's robust gains over per-domain and domain-agnostic baselines when fine-tuning Qwen2.5 and Llama-3 on MetaMathQA and Mol-Instructions benchmarks. It also effectively reduced conflicting gradient pairs within batches.

Key takeaway

For Machine Learning Engineers fine-tuning Large Language Models on heterogeneous data, consider implementing PartitionSel to optimize minibatch selection. This method demonstrably improves convergence and domain coverage compared to simpler baselines, reducing conflicting gradient pairs within batches. Your training updates will be more compatible, leading to robust performance gains on benchmarks like MetaMathQA and Mol-Instructions. Evaluate its integration to enhance your LLM fine-tuning workflows.

Key insights

PartitionSel optimizes LLM fine-tuning on heterogeneous data by balancing domain coverage and convergence via gradient-matching and matroid constraints.

Principles

Method

PartitionSel maximizes a validation-guided gradient-matching utility under per-domain budgets, encoded as a partition-matroid constraint. It uses an orthogonal matching pursuit algorithm for optimization, ensuring cross-domain coupling and reduced redundancy.

In practice

Topics

Best for: AI Engineer, Research Scientist, AI Scientist, Machine Learning Engineer, NLP Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Machine Learning.