Translate-R1: Cost-Aware Translation Tool Use via Reinforcement Learning

· Source: cs.CL updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics · Depth: Expert, extended

Summary

Translate-R1, a novel reinforcement learning (RL) policy developed by Amazon Stores Foundation AI, enables large language models (LLMs) to intelligently decide when to use translation tools, optimizing cost and performance across diverse languages and domains. The policy, trained on the post-trained Qwen3-4B model across 22 languages in 3 resource tiers and 5 domains, utilizes confidence-gated GSPO for cost-sensitive tool use. It achieved reward lifts of +4.6 on High, +23.5 on Low, and +17.5 on XLow resource languages. Compared to an unconstrained policy, Translate-R1 preserves full reward at 63% of the cost and is Pareto-optimal across 87% of the cost-sensitivity range. It also improved by +18.7 on two synthetic, completely unseen languages and transferred zero-shot to 9 held-out languages, validated by an answer-preserving translation pipeline with 98.4% fidelity.

Key takeaway

For AI scientists and ML engineers deploying multilingual LLMs, integrating a confidence-gated reinforcement learning policy like Translate-R1 is crucial. This approach allows your models to adaptively use translation tools, significantly reducing unnecessary costs for high-resource languages while preserving performance gains for low-resource and unseen languages. You should consider this method to achieve Pareto-optimal cost-performance trade-offs in your multilingual applications, ensuring efficient resource allocation without sacrificing accuracy.

Key insights

A learned policy allows LLMs to introspect their comprehension and invoke translation tools only when necessary, balancing performance and cost.

Principles

Method

Continue RL on Qwen3-4B using confidence-gated GSPO, leveraging an answer-preserving translation pipeline for multilingual verifiable rewards.

In practice

Topics

Best for: AI Engineer, Research Scientist, AI Scientist, Machine Learning Engineer, NLP Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CL updates on arXiv.org.