Type VI Hallucination: When LLMs Apply the Wrong Domain’s Rules
Summary
A new method, Type VI hallucination detection, addresses a critical gap in evaluating Large Language Model (LLM) outputs: identifying claims that are factually correct in one domain but contextually incorrect in another. This "domain mismatch" occurs when LLMs use shared vocabulary with different meanings across distinct fields, such as general Machine Learning (ML) engineering versus financial services model risk management (MRM) under SR 11-7 regulations. The proposed geometric approach builds orthonormal bases for specific domain subspaces using exemplar sentences and then measures how well a given LLM claim's embedding aligns with an expected domain versus other domains. A positive Type VI score indicates a domain mismatch, flagging outputs that might pass traditional factual checks (Types I-V) but are semantically inappropriate for the intended context.
Key takeaway
For AI Engineers or Data Scientists building LLM applications in regulated or specialized fields, you should integrate Type VI hallucination detection to prevent contextually incorrect outputs. This is especially crucial where shared vocabulary carries different meanings across domains (e.g., ML vs. regulatory compliance). Implementing this geometric approach ensures your LLM outputs are not only factually accurate but also semantically appropriate for the target domain, mitigating risks of misinterpretation and non-compliance.
Key insights
Type VI hallucination detection identifies domain-specific semantic mismatches in LLM outputs using geometric embedding analysis.
Principles
- Each domain occupies a distinctive region of embedding space.
- Domain mismatch is quantifiable via projection onto orthonormal bases.
Method
Build orthonormal bases for domain subspaces from exemplar sentences using QR decomposition. Measure a claim's "support" in each domain's subspace. A Type VI score is the difference between support in the best-matching wrong domain and the expected domain.
In practice
- Define domain exemplars from expert-written content.
- Calibrate score thresholds on known good/bad examples.
- Use for post-hoc auditing of LLM-assisted documentation.
Topics
- LLM Hallucination Detection
- Domain Mismatch
- Sentence Embeddings
- QR Decomposition
- Model Risk Management
Best for: Machine Learning Engineer, AI Engineer, Data Scientist
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Agus’s Substack.