Leveraging LLM-GNN Integration for Open-World Question Answering over Knowledge Graphs
Summary
GLOW is a novel hybrid system that integrates Large Language Models (LLMs) with Graph Neural Networks (GNNs) to address Open-World Question Answering (OW-QA) over incomplete or evolving knowledge graphs (KGs). Unlike traditional KGQA systems that assume complete graphs or rely on retrieval, GLOW uses a pre-trained GNN to predict top-$k$ candidate answers and retrieve relevant KG facts. These are then serialized into a structured prompt to guide the LLM's reasoning, enabling joint reasoning over symbolic and semantic signals without fine-tuning. The researchers introduced GLOW-Bench, a new 1,000-question benchmark for OW-QA with multi-hop reasoning across diverse domains. GLOW consistently outperformed existing LLM-GNN systems on standard benchmarks and GLOW-Bench, achieving up to 53.3% and an average 38% improvement, demonstrating its robustness and generalizability.
Key takeaway
For research scientists developing advanced QA systems, GLOW offers a robust approach to tackle open-world question answering over incomplete knowledge graphs. You should consider adopting GLOW's hybrid LLM-GNN architecture and its structured prompting mechanism to improve reasoning accuracy and reduce reliance on complete graph data, especially for domain-specific or multi-hop questions. This method provides superior performance and efficiency compared to existing retrieval-based or purely LLM-driven solutions, making it a strong candidate for real-world applications with evolving knowledge bases.
Key insights
Integrating GNN-predicted candidates and KG context into LLM prompts significantly improves open-world question answering over incomplete knowledge graphs.
Principles
- Combine structural and semantic signals for robust reasoning.
- Decouple GNN and LLM training for scalability.
- Structured prompts enhance LLM accuracy and consistency.
Method
GLOW extracts question entities, retrieves 1-hop KG triples, and uses a GNN to predict top-$k$ candidate answers. These are serialized into a structured prompt for an LLM to generate answers.
In practice
- Use GLOW-GN for best performance on OW-QA tasks.
- Limit GNN top-$K$ answers to 3 for optimal LLM accuracy.
- Employ LLM-as-a-Judge for robust answer evaluation.
Topics
- Open-World Question Answering
- Knowledge Graphs
- LLM-GNN Integration
- Graph Neural Networks
- Large Language Models
Code references
Best for: Research Scientist, AI Scientist, Machine Learning Engineer, NLP Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.