Leveraging LLM-GNN Integration for Open-World Question Answering over Knowledge Graphs

2026-04-16 · Source: cs.AI updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics · Depth: Expert, extended

Summary

GLOW is a novel hybrid system that integrates Large Language Models (LLMs) with Graph Neural Networks (GNNs) to address Open-World Question Answering (OW-QA) over incomplete or evolving knowledge graphs (KGs). Unlike traditional KGQA systems that assume complete graphs or rely on retrieval, GLOW uses a pre-trained GNN to predict top-$k$ candidate answers and retrieve relevant KG facts. These are then serialized into a structured prompt to guide the LLM's reasoning, enabling joint reasoning over symbolic and semantic signals without fine-tuning. The researchers introduced GLOW-Bench, a new 1,000-question benchmark for OW-QA with multi-hop reasoning across diverse domains. GLOW consistently outperformed existing LLM-GNN systems on standard benchmarks and GLOW-Bench, achieving up to 53.3% and an average 38% improvement, demonstrating its robustness and generalizability.

Key takeaway

For research scientists developing advanced QA systems, GLOW offers a robust approach to tackle open-world question answering over incomplete knowledge graphs. You should consider adopting GLOW's hybrid LLM-GNN architecture and its structured prompting mechanism to improve reasoning accuracy and reduce reliance on complete graph data, especially for domain-specific or multi-hop questions. This method provides superior performance and efficiency compared to existing retrieval-based or purely LLM-driven solutions, making it a strong candidate for real-world applications with evolving knowledge bases.

Key insights

Integrating GNN-predicted candidates and KG context into LLM prompts significantly improves open-world question answering over incomplete knowledge graphs.

Principles

Combine structural and semantic signals for robust reasoning.
Decouple GNN and LLM training for scalability.
Structured prompts enhance LLM accuracy and consistency.

Method

GLOW extracts question entities, retrieves 1-hop KG triples, and uses a GNN to predict top-$k$ candidate answers. These are serialized into a structured prompt for an LLM to generate answers.

In practice

Use GLOW-GN for best performance on OW-QA tasks.
Limit GNN top-$K$ answers to 3 for optimal LLM accuracy.
Employ LLM-as-a-Judge for robust answer evaluation.

Topics

Open-World Question Answering
Knowledge Graphs
LLM-GNN Integration
Graph Neural Networks
Large Language Models

Code references

Best for: Research Scientist, AI Scientist, Machine Learning Engineer, NLP Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.