Improving Cross-Format Robustness in Language Models with Multi-Format Training

2026-06-10 · Source: Takara TLDR - Daily AI Papers · Field: Technology & Digital — Artificial Intelligence & Machine Learning · Depth: Expert, medium

Summary

Large language models often exhibit sensitivity to answer format, meaning a question correctly answered in one form might fail in another semantically equivalent one. This paper defines cross-format robustness as a model's consistency across formats for the same underlying question. Researchers compared full-format training with FormatMix, a technique that expands only a subset of training items into multiple equivalent formats using random or targeted selection. Across GLM4 and Llama-3.1 models, multi-format supervision consistently enhanced both task performance and cross-format robustness. Notably, Multiple-choice question (MCQ)-only supervision offered little benefit and could even decrease robustness. The study found that expanding approximately 30% of the training set into multiple formats often achieved most of the gains seen with full-format training, suggesting format diversity, rather than just additional supervision, is the primary driver of robustness. This lightweight multi-format augmentation offers a practical way to reduce LLM format sensitivity without altering the base model.

Key takeaway

For Machine Learning Engineers fine-tuning large language models, you should integrate multi-format training to enhance cross-format robustness and task performance. Consider using FormatMix to expand approximately 30% of your training data into diverse formats. This lightweight augmentation recovers most gains from full-format training. Avoid relying solely on Multiple-choice question (MCQ)-only supervision, as it can reduce model robustness. This approach makes your LLMs less sensitive to varied answer formats without complex base model modifications.

Key insights

Multi-format training, especially with partial data augmentation, significantly improves LLM robustness to answer format changes.

Principles

Format diversity drives robustness.
Partial multi-format training is effective.
MCQ-only training can reduce robustness.

Method

FormatMix expands a subset of training items (e.g., 30%) into multiple equivalent formats using random or targeted selection to improve cross-format robustness in LLMs.

In practice

Augment 30% of training data.
Avoid MCQ-only supervision.
Apply FormatMix for LLM fine-tuning.

Topics

Cross-Format Robustness
Language Models
Multi-Format Training
FormatMix
Data Augmentation

Best for: AI Engineer, Research Scientist, AI Scientist, Machine Learning Engineer, NLP Engineer

Related on AIssential

See Counsel's argued verdicts on the open AI decisions leaders are weighing →

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Takara TLDR - Daily AI Papers.