G-Long: Graph-Enhanced Memory Management for Efficient Long-Term Dialogue Agents
Summary
G-Long is a graph-enhanced framework designed to improve long-term consistency and efficiency in open-domain dialogue systems, addressing inherent limitations of large language models (LLMs) in long-context reasoning and processing extensive raw text. It utilizes a fine-tuned small Language Model (sLM) for structured triplet extraction and associative retrieval, significantly reducing operational costs. G-Long introduces a novel attention-aware importance scoring mechanism, leveraging the intrinsic cross-attention signals of a T5 summarizer to identify salient memories. Extensive experiments across diverse benchmarks demonstrate that G-Long achieves state-of-the-art performance, yielding performance gains of up to 9.8% in response quality on MSC and 40.8% in retrieval recall on LME, while minimizing computational overhead by reducing memory maintenance costs by \$4.9\times$ and token consumption by 63.0%.
Key takeaway
For machine learning engineers developing long-term dialogue agents, G-Long offers a robust solution to enhance consistency and reduce operational costs. By adopting its graph-enhanced memory and sLM-based triplet extraction, you can achieve superior response quality and retrieval accuracy while cutting LLM-API expenses by \$4.9\times$ and token consumption by 63.0%. Consider implementing a 1-hop subgraph expansion for efficient retrieval.
Key insights
G-Long uses graph-enhanced memory with a small LM and attention-aware scoring for efficient, consistent long-term dialogue.
Principles
- Structured graph memory mitigates retrieval ambiguity and information loss.
- Attention-aware importance scoring identifies salient memories without external LLM costs.
- Offloading memory construction to an sLM eliminates prohibitive LLM-API dependencies.
Method
G-Long constructs a graph memory bank by extracting (subject, relation, object) triplets using a fine-tuned sLM and assigning importance scores via a T5 summarizer's cross-attention. It then performs associative retrieval and two-stage hybrid reranking.
In practice
- Fine-tune an sLM (e.g., Qwen-3-8B) for dialogue triplet extraction.
- Leverage T5 cross-attention maps for memory importance scoring.
- Employ a 1-hop subgraph expansion strategy for efficient retrieval.
Topics
- Long-term Dialogue Systems
- Graph-Enhanced Memory
- Small Language Models
- Triplet Extraction
- Associative Retrieval
- Computational Efficiency
Best for: Research Scientist, AI Architect, AI Engineer, AI Scientist, Machine Learning Engineer, NLP Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.