Layer-Resolved Optimal Transport for Hallucination Detection in NMT and Abstractive Summarization
Summary
Layer-Resolved Optimal Transport (OT) is extended for unsupervised hallucination detection in neural machine translation (NMT) and abstractive summarization. Analyzing the Fairseq DE-EN model's six decoder layers (N=3,414), researchers found Wass-to-Unif and Wass-to-Data are complementary detectors, with detection concentrated in layers L1-L4. Hallucinated NMT translations notably lack an initial exploratory attention phase. For abstractive summarization, the OT detector on AggreFact (N=1,116) achieved 57.2%/57.6% balanced accuracy on CNN/XSum, which is below supervised MiniCheck-Flan-T5-L (69.9%/74.3%). This gap arises because unfaithful summaries can attend correctly but misrepresent content, a failure mode invisible to concentration-based OT. Structural experiments on T5-base confirmed consistent decoder organization, with Layer3 showing peak concentration and Layer12 being critical for generation quality. OT is reliable when failure is source disengagement but limited when faithfulness issues are downstream of attention.
Key takeaway
For machine learning engineers evaluating NMT or abstractive summarization models, understand that Layer-Resolved Optimal Transport effectively identifies hallucinations stemming from source disengagement. However, it is fundamentally limited in detecting unfaithful summaries where content misrepresentation occurs despite correct source attention. You should consider combining OT with other faithfulness metrics to cover a broader spectrum of potential failure modes.
Key insights
Optimal Transport detects hallucinations from source disengagement, but not content misrepresentation.
Principles
- Wass-to-Unif and Wass-to-Data are complementary hallucination detectors.
- Hallucination detection is layer-dependent in NMT decoders.
- Unfaithful summaries can attend correctly while misrepresenting content.
Method
Optimal Transport measures geometric distance between cross-attention distributions and a reference to detect source disengagement in NMT and summarization.
In practice
- Apply OT on cross-attention for NMT hallucination detection.
- Use OT as an interpretability tool for attention mechanisms.
- Recognize OT's limits when faithfulness failures are downstream of attention.
Topics
- Neural Machine Translation
- Abstractive Summarization
- Hallucination Detection
- Optimal Transport
- Cross-Attention
- Model Interpretability
Best for: Research Scientist, AI Scientist, Machine Learning Engineer, NLP Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Takara TLDR - Daily AI Papers.