Learning When to Translate for Multilingual Reasoning

· Source: Artificial Intelligence · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Natural Language Processing · Depth: Expert, quick

Summary

Luar, a Language Understanding Boundary-aware Reinforcement Learning framework, enables Reasoning Language Models (RLMs) to selectively invoke English translation for multilingual reasoning tasks. This framework addresses substantial multilingual reasoning gaps in RLMs, which often stem from language-understanding failures in non-English inputs. Instead of translating every input, Luar trains RLMs to choose between direct reasoning from the original query and reasoning over its English translation, activating translation only when direct understanding is unreliable and translator-augmented reasoning is expected to substantially outperform direct reasoning. Across various multilingual reasoning benchmarks, Luar demonstrates superior performance compared to standard GRPO and other training-based baselines, yielding particularly significant gains on low-resource languages. The framework also effectively avoids unnecessary translations when direct reasoning is sufficient and generalizes its selective translation behavior to previously unseen low-resource languages. The project will be made publicly available at https://github.com/deokhk/LUAR.

Key takeaway

For Machine Learning Engineers developing multilingual Reasoning Language Models, you should consider integrating selective translation mechanisms like Luar. This approach allows your models to dynamically decide when to translate non-English inputs, significantly improving performance on complex reasoning tasks, particularly for low-resource languages where direct understanding often fails. Implementing this can reduce unnecessary translation overhead while boosting accuracy, making your multilingual systems more efficient and robust.

Key insights

RLMs can learn to selectively translate non-English inputs only when direct understanding is unreliable, improving multilingual reasoning.

Principles

Method

Luar trains RLMs using a reinforcement learning framework to choose between direct reasoning and reasoning over English translation, invoking translation only when translator-augmented reasoning is expected to substantially outperform direct reasoning.

In practice

Topics

Code references

Best for: AI Engineer, Research Scientist, AI Scientist, Machine Learning Engineer, NLP Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.