Improving Cross-Format Robustness in Language Models with Multi-Format Training

· Source: Takara TLDR - Daily AI Papers · Field: Technology & Digital — Artificial Intelligence & Machine Learning · Depth: Expert, medium

Summary

Large language models often exhibit sensitivity to answer format, meaning a question correctly answered in one form might fail in another semantically equivalent one. This paper defines cross-format robustness as a model's consistency across formats for the same underlying question. Researchers compared full-format training with FormatMix, a technique that expands only a subset of training items into multiple equivalent formats using random or targeted selection. Across GLM4 and Llama-3.1 models, multi-format supervision consistently enhanced both task performance and cross-format robustness. Notably, Multiple-choice question (MCQ)-only supervision offered little benefit and could even decrease robustness. The study found that expanding approximately 30% of the training set into multiple formats often achieved most of the gains seen with full-format training, suggesting format diversity, rather than just additional supervision, is the primary driver of robustness. This lightweight multi-format augmentation offers a practical way to reduce LLM format sensitivity without altering the base model.

Key takeaway

For Machine Learning Engineers fine-tuning large language models, you should integrate multi-format training to enhance cross-format robustness and task performance. Consider using FormatMix to expand approximately 30% of your training data into diverse formats. This lightweight augmentation recovers most gains from full-format training. Avoid relying solely on Multiple-choice question (MCQ)-only supervision, as it can reduce model robustness. This approach makes your LLMs less sensitive to varied answer formats without complex base model modifications.

Key insights

Multi-format training, especially with partial data augmentation, significantly improves LLM robustness to answer format changes.

Principles

Method

FormatMix expands a subset of training items (e.g., 30%) into multiple equivalent formats using random or targeted selection to improve cross-format robustness in LLMs.

In practice

Topics

Best for: AI Engineer, Research Scientist, AI Scientist, Machine Learning Engineer, NLP Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Takara TLDR - Daily AI Papers.