Improving Cross-Format Robustness in Language Models with Multi-Format Training
Summary
Large language models often exhibit sensitivity to answer format, meaning a question correctly answered in one form might fail in another semantically equivalent one. This paper defines cross-format robustness as a model's consistency across formats for the same underlying question. Researchers compared full-format training with FormatMix, a technique that expands only a subset of training items into multiple equivalent formats using random or targeted selection. Across GLM4 and Llama-3.1 models, multi-format supervision consistently enhanced both task performance and cross-format robustness. Notably, Multiple-choice question (MCQ)-only supervision offered little benefit and could even decrease robustness. The study found that expanding approximately 30% of the training set into multiple formats often achieved most of the gains seen with full-format training, suggesting format diversity, rather than just additional supervision, is the primary driver of robustness. This lightweight multi-format augmentation offers a practical way to reduce LLM format sensitivity without altering the base model.
Key takeaway
For Machine Learning Engineers fine-tuning large language models, you should integrate multi-format training to enhance cross-format robustness and task performance. Consider using FormatMix to expand approximately 30% of your training data into diverse formats. This lightweight augmentation recovers most gains from full-format training. Avoid relying solely on Multiple-choice question (MCQ)-only supervision, as it can reduce model robustness. This approach makes your LLMs less sensitive to varied answer formats without complex base model modifications.
Key insights
Multi-format training, especially with partial data augmentation, significantly improves LLM robustness to answer format changes.
Principles
- Format diversity drives robustness.
- Partial multi-format training is effective.
- MCQ-only training can reduce robustness.
Method
FormatMix expands a subset of training items (e.g., 30%) into multiple equivalent formats using random or targeted selection to improve cross-format robustness in LLMs.
In practice
- Augment 30% of training data.
- Avoid MCQ-only supervision.
- Apply FormatMix for LLM fine-tuning.
Topics
- Cross-Format Robustness
- Language Models
- Multi-Format Training
- FormatMix
- Data Augmentation
Best for: AI Engineer, Research Scientist, AI Scientist, Machine Learning Engineer, NLP Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Takara TLDR - Daily AI Papers.