Detecting Hallucinations for Large Language Model-based Knowledge Graph Reasoning
Summary
LUCID is introduced as the first hallucination detection method for Large Language Model-based Knowledge Graph (LLM-KG) reasoning frameworks. LLMs, despite incorporating KG information, exhibit an average hallucination rate of 29.65% across three frameworks (Readi, ToG, StructGPT) on KBQA datasets like GrailQA, WebQSP, and QALD-10. LUCID addresses this by jointly utilizing LLM attention scores, KG semantics, and structural information, integrating them via a graph neural network (GNN). Evaluated on manually annotated benchmark datasets, LUCID achieves state-of-the-art performance against 15 baselines, outperforming SelfCheckGPT by 6.76% and ReDeEP (chunk) by 5.48% on average. It also demonstrates high efficiency with 0.04 milliseconds inference time per sample and can reduce QA costs by 55.4% while maintaining accuracy.
Key takeaway
For MLOps engineers deploying LLM-based knowledge graph reasoning systems, you should integrate specialized hallucination detection like LUCID to improve reliability and manage costs. By using LUCID's hallucination probabilities, you can selectively reprocess high-risk outputs with more powerful, expensive models, reducing overall API costs by over 55% while maintaining accuracy comparable to using the larger model universally. This approach ensures more trustworthy outputs for critical applications.
Key insights
LUCID detects LLM hallucinations in KG reasoning by fusing LLM attention, KG semantics, and structural information via a GNN.
Principles
- LLM attention, KG semantics, and structure are crucial for robust hallucination detection.
- RAG-specific detection methods significantly outperform general-purpose approaches.
- Graph neural networks effectively model KG topological relationships for enhanced detection.
Method
LUCID extracts node/edge features from LLM attention scores and KG semantic similarities, then feeds these into a GINE model to predict hallucination probability.
In practice
- Employ LUCID's hallucination probabilities for cost-effective QA refinement.
- Integrate LLM attention scores and KG semantics for robust detection.
- Utilize GNNs to capture structural consistency in KG reasoning.
Topics
- Large Language Models
- Knowledge Graph Reasoning
- Hallucination Detection
- Graph Neural Networks
- Retrieval-Augmented Generation
- QA Refinement
Best for: Research Scientist, AI Engineer, NLP Engineer, AI Scientist, Machine Learning Engineer, MLOps Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CL updates on arXiv.org.