G-Long: Graph-Enhanced Memory Management for Efficient Long-Term Dialogue Agents

2026-06-12 · Source: cs.AI updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics · Depth: Expert, extended

Summary

G-Long is a graph-enhanced framework designed to improve long-term consistency and efficiency in open-domain dialogue systems, addressing inherent limitations of large language models (LLMs) in long-context reasoning and processing extensive raw text. It utilizes a fine-tuned small Language Model (sLM) for structured triplet extraction and associative retrieval, significantly reducing operational costs. G-Long introduces a novel attention-aware importance scoring mechanism, leveraging the intrinsic cross-attention signals of a T5 summarizer to identify salient memories. Extensive experiments across diverse benchmarks demonstrate that G-Long achieves state-of-the-art performance, yielding performance gains of up to 9.8% in response quality on MSC and 40.8% in retrieval recall on LME, while minimizing computational overhead by reducing memory maintenance costs by \$4.9\times$ and token consumption by 63.0%.

Key takeaway

For machine learning engineers developing long-term dialogue agents, G-Long offers a robust solution to enhance consistency and reduce operational costs. By adopting its graph-enhanced memory and sLM-based triplet extraction, you can achieve superior response quality and retrieval accuracy while cutting LLM-API expenses by \$4.9\times$ and token consumption by 63.0%. Consider implementing a 1-hop subgraph expansion for efficient retrieval.

Key insights

G-Long uses graph-enhanced memory with a small LM and attention-aware scoring for efficient, consistent long-term dialogue.

Principles

Structured graph memory mitigates retrieval ambiguity and information loss.
Attention-aware importance scoring identifies salient memories without external LLM costs.
Offloading memory construction to an sLM eliminates prohibitive LLM-API dependencies.

Method

G-Long constructs a graph memory bank by extracting (subject, relation, object) triplets using a fine-tuned sLM and assigning importance scores via a T5 summarizer's cross-attention. It then performs associative retrieval and two-stage hybrid reranking.

In practice

Fine-tune an sLM (e.g., Qwen-3-8B) for dialogue triplet extraction.
Leverage T5 cross-attention maps for memory importance scoring.
Employ a 1-hop subgraph expansion strategy for efficient retrieval.

Topics

Long-term Dialogue Systems
Graph-Enhanced Memory
Small Language Models
Triplet Extraction
Associative Retrieval
Computational Efficiency

Best for: Research Scientist, AI Architect, AI Engineer, AI Scientist, Machine Learning Engineer, NLP Engineer

Related on AIssential

See Counsel's argued verdicts on the open AI decisions leaders are weighing →

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.