Improving Cross-Lingual Factual Recall via Consistency-Driven Reinforcement Learning

· Source: cs.CL updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Natural Language Processing, Data Science & Analytics · Depth: Expert, extended

Summary

Researchers introduced PolyFact, a large-scale parallel multilingual factual QA dataset comprising 100,000 Wikidata-grounded facts across 12 typologically diverse languages. This dataset was used to evaluate methods for improving cross-lingual factual recall in large language models like Qwen-2.5-7B and OLMo-2-1124-7B. Consistency-driven reinforcement learning via Group Relative Policy Optimization (GRPO) consistently outperformed supervised fine-tuning (SFT), enhancing both cross-lingual consistency and generalization to unseen languages. Light continual pretraining (CPT) on parallel data yielded limited additional gains. Mechanistic analyses revealed that GRPO reorganizes multilingual routing by reducing language specialization in MLP layers and attention heads, promoting shared cross-lingual representations instead of surface-level memorization. The code, models, and dataset are open-sourced.

Key takeaway

For AI Scientists and Machine Learning Engineers developing multilingual LLMs, you should prioritize consistency-driven reinforcement learning (GRPO) over supervised fine-tuning to improve cross-lingual factual recall and generalization. GRPO fundamentally restructures internal representations for better knowledge access across languages, whereas SFT often leads to superficial memorization. Evaluate your models on free-form generation tasks like KLAR to ensure genuine cross-lingual retrieval, not just candidate selection.

Key insights

Consistency-driven reinforcement learning improves LLM cross-lingual factual recall by promoting shared internal representations.

Principles

Method

PolyFact, a 100K-fact, 12-language Wikidata-grounded QA dataset, enables consistency-driven RL via GRPO. GRPO uses grouped rollouts with a reward bonus for all-language correctness.

In practice

Topics

Code references

Best for: AI Engineer, Research Scientist, AI Scientist, Machine Learning Engineer, NLP Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CL updates on arXiv.org.