Dont ignore omissions!
Summary
The NLP community significantly under-researches omissions in Large Language Model (LLM) generated text, despite their critical real-world impact, particularly in high-stakes domains like medicine and law. While "hallucinations" (false information) receive extensive attention, "omissions" (missing important information) are largely overlooked, with ACL25 showing 96 papers on hallucination versus 0 on omission, and EMNLP 2025 showing 64 versus 1. Studies like Wu et al. (2025) found that 76% of LLM-generated responses causing serious harm in medical cases were due to omissions, not hallucinations. Omissions also pose significant risks in machine translation, risk reporting, summarization, weather forecasts, and coding assistants. Detecting omissions is more challenging than detecting hallucinations, often requiring domain-specific knowledge or "gold standard" content lists, which are less common for summarization tasks.
Key takeaway
For AI Architects and Research Scientists evaluating LLM safety in critical applications like healthcare or legal tech, you must prioritize robust omission detection. Current accuracy-focused benchmarks are insufficient and likely underestimate real-world risks. Incorporate domain expert review or "gold standard" content lists into your evaluation protocols to ensure all vital information is present, as omissions can lead to severe consequences, outweighing the risks of hallucinations in many contexts.
Key insights
LLM omissions are a critical, under-researched problem, especially in high-stakes domains like medicine.
Principles
- Accuracy benchmarks underestimate LLM risks.
- Domain knowledge is crucial for detecting omissions.
Method
Detecting omissions often involves comparing generated text against a "gold standard" list of required content or using domain experts to identify missing key information.
In practice
- Prioritize omission detection in medical LLM deployments.
- Develop domain-specific content checklists for evaluation.
Topics
- LLM Omissions
- Hallucination Detection
- Medical NLP
- NLG Evaluation
- AI Safety
Best for: AI Architect, AI Scientist, Research Scientist, AI Researcher, NLP Engineer, AI Product Manager
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Ehud Reiter's Blog.