TLPO: Token-Level Policy Optimization for Mitigating Language Confusion in Large Language Models

2026-04-29 · Source: Computation and Language · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Natural Language Processing · Depth: Expert, quick

Summary

A new fine-tuning framework, Token-Level Policy Optimization (TLPO), has been developed to address language confusion in large language models (LLMs). LLMs often struggle to consistently generate responses in the intended language, a problem that prior sequence-level fine-tuning methods like DPO, ORPO, and GRPO can exacerbate by degrading general model capabilities. TLPO mitigates this by applying localized, token-level updates. It identifies specific error-prone positions, explores alternative candidate tokens, and then updates the model's policy to suppress error-inducing outputs at a granular level. This selective intervention effectively improves language consistency across multiple multilingual LLMs and diverse languages, outperforming baselines while preserving downstream task accuracy.

Key takeaway

For AI Engineers deploying multilingual LLMs, TLPO offers a precise method to enhance language consistency without the performance trade-offs associated with broader fine-tuning approaches. You should consider integrating TLPO to improve the reliability of language generation in specific contexts, ensuring outputs consistently match the intended language while preserving the model's overall accuracy on other tasks.

Key insights

TLPO uses token-level policy optimization to mitigate language confusion in LLMs without degrading general capabilities.

Principles

Localized updates prevent general capability degradation.
Granular intervention improves language consistency.

Method

TLPO identifies error-prone token positions, explores alternative candidate tokens, and updates the policy using a tailored objective to suppress error-inducing outputs.

In practice

Apply TLPO to multilingual LLMs.
Use TLPO for language consistency tasks.

Topics

Token-Level Policy Optimization
Language Confusion
Multilingual LLMs
Fine-tuning Frameworks
Policy Optimization

Best for: AI Engineer, Research Scientist, AI Scientist, Machine Learning Engineer, NLP Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Computation and Language.