Learning When to Translate for Multilingual Reasoning

· Source: Takara TLDR - Daily AI Papers · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Natural Language Processing · Depth: Expert, medium

Summary

Luar, a Language Understanding Boundary-aware Reinforcement Learning framework, addresses significant multilingual reasoning gaps in Reasoning Language Models (RLMs) caused by failures in non-English input understanding. While English translation can mitigate these issues, translating every input is often unnecessary. Luar trains RLMs to selectively invoke translation, enabling the model to choose between directly solving the original input or reasoning over its English translation. This framework encourages translation only when translator-augmented reasoning is anticipated to substantially outperform direct reasoning. Across multilingual reasoning benchmarks, Luar demonstrates superior performance compared to standard GRPO and other training-based baselines, showing particularly large gains on low-resource languages. Further analysis indicates Luar effectively avoids superfluous translation when direct reasoning is sufficient and successfully extends its selective translator-call behavior to unseen low-resource languages. The project code will be publicly available.

Key takeaway

For Machine Learning Engineers developing multilingual reasoning models, Luar offers a strategic approach to enhance performance and efficiency. You should consider integrating selective translation mechanisms into your RLM pipelines, particularly for low-resource languages, to mitigate language-understanding failures without incurring the overhead of universal translation. This method allows your models to dynamically assess when translation is truly beneficial, leading to more robust and resource-efficient multilingual reasoning capabilities.

Key insights

Luar enables RLMs to selectively translate non-English inputs only when direct understanding is unreliable, improving multilingual reasoning.

Principles

Method

Luar trains Reasoning Language Models via reinforcement learning to choose between direct input processing and reasoning over English translation, invoking translation only when it substantially outperforms direct reasoning.

In practice

Topics

Code references

Best for: AI Engineer, Research Scientist, AI Scientist, Machine Learning Engineer, NLP Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Takara TLDR - Daily AI Papers.