Detecting Translation Hallucinations with Attention Misalignment
Summary
A novel method for interpretable token-level quality estimation (QE) in Neural Machine Translation (NMT) is proposed, addressing the challenge of NMT models hallucinating, especially in low-resource or rare language pairs. Unlike black-box methods like output probability entropy, Semantic Entropy, or xCOMET, this approach leverages existing forward (src→tgt) and backward (tgt→src) NMT models. It computes uncertainty signals by comparing transposed cross-attention maps from both models after using teacher forcing on the backward model. This technique extracts 75 attention alignment-based features, categorized into Focus, Reciprocity, and Sink, which are then fed into a lightweight MLP classifier. Experiments on ZH→EN and FR→EN pairs, using a dataset of 15k translations annotated via "LLM-as-a-judge," demonstrate that combining these attention features with output entropy significantly improves QE performance, achieving ROC-AUCs of 0.750 and 0.849 respectively.
Key takeaway
For Machine Learning Engineers building or deploying NMT systems, you should consider integrating this bidirectional attention-based quality estimation method. It provides interpretable, token-level uncertainty signals without retraining the core NMT model, allowing you to allocate resources more efficiently for difficult translations or flag potential hallucinations before deployment. This approach offers a significant improvement over entropy-only methods, especially for typologically distant language pairs.
Key insights
Bidirectional attention map comparison offers an interpretable and efficient way to detect NMT hallucinations.
Principles
- Uncertainty does not always mean error.
- Attention patterns reveal model grounding.
- Combine signals for robust error detection.
Method
Train bidirectional NMTs, generate translations, extract 75 attention-based features (Focus, Reciprocity, Sink) per token, and train a lightweight MLP classifier on these features with frozen NMT weights.
In practice
- Use existing forward/backward NMT models.
- Train a small classifier on attention features.
- Apply to RAG or summarization for grounding.
Topics
- Neural Machine Translation
- Translation Hallucinations
- Attention Misalignment
- Quality Estimation
- Cross-Attention Maps
Code references
Best for: AI Scientist, Machine Learning Engineer, NLP Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Towards Data Science.