AdaMame: A Training Recipe for Adaptive Multilingual Reasoning

2026-06-13 · Source: Artificial Intelligence · Field: Technology & Digital — Artificial Intelligence & Machine Learning · Depth: Expert, quick

Summary

AdaMame is a novel two-stage training recipe designed to enhance multilingual mathematical reasoning in Large Reasoning Models (LRMs), specifically addressing the "language collapse" phenomenon where models fail to reason in the query language. Unlike existing reinforcement learning (RL) fixes that often incur trade-offs in accuracy, mid-trace code-switching, or excessive token usage, AdaMame adaptively aligns the reasoning language to the query language while preserving accuracy. The first stage involves Supervised Fine-Tuning (SFT) on naturally occurring reasoning traces across five languages to build foundational multilingual reasoning capabilities. The subsequent RL stage introduces AdaMame-GRPO, an adaptation of Group Relative Policy Optimization (GRPO), which uses a progressively growing query-conditioned alignment factor to guide the model from exploring diverse reasoning languages to exploiting reasoning in the query language. Evaluated across two benchmarks, two LRMs, and 12 languages, AdaMame-GRPO achieved Pareto-optimal performance in reasoning accuracy, language fidelity, and token efficiency, showing significant improvements, particularly for out-of-domain, lower-resource languages.

Key takeaway

For Machine Learning Engineers deploying multilingual Large Reasoning Models, AdaMame provides a robust solution to "language collapse" and associated trade-offs. You should consider integrating AdaMame's two-stage training, particularly AdaMame-GRPO, to achieve Pareto-optimal performance across reasoning accuracy, language fidelity, and token efficiency. This approach offers significant gains, especially for out-of-domain and lower-resource languages, ensuring your models reason effectively in the query language without compromising performance.

Key insights

AdaMame's two-stage training adaptively aligns LRM reasoning to query language, resolving "language collapse" without accuracy compromise.

Principles

Adaptive alignment improves multilingual reasoning.
Progressive guidance from exploration to exploitation.
Pareto-optimal performance across multiple metrics is achievable.

Method

AdaMame employs a two-stage process: SFT on multilingual traces, then RL with AdaMame-GRPO, progressively aligning reasoning language to the query.

In practice

Apply AdaMame to improve LRM performance in low-resource languages.
Use AdaMame-GRPO for balanced accuracy, fidelity, and token efficiency.

Topics

Multilingual Reasoning
Large Reasoning Models
Language Collapse
Reinforcement Learning
Supervised Fine-Tuning
Token Efficiency

Best for: Research Scientist, AI Engineer, AI Scientist, Machine Learning Engineer, NLP Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.