SproutRAG: Attention-Guided Tree Search with Progressive Embeddings for Long-Document RAG

· Source: cs.CL updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Natural Language Processing · Depth: Expert, long

Summary

SproutRAG is a novel attention-guided hierarchical Retrieval-Augmented Generation (RAG) framework designed for long documents, addressing the balance between retrieval granularity and contextual coherence. It constructs a binary chunking tree by organizing sentence-level chunks into progressively larger, semantically coherent units using learned inter-sentence attention. Unlike prior methods, SproutRAG avoids costly LLM calls during indexing or retrieval, fixed context expansion, or lossy summarization. It learns optimal attention heads and layers to capture semantic document structure, enabling multi-granularity retrieval via hierarchical beam search. Trained end-to-end with a joint objective, SproutRAG improves information efficiency (IE) by 6.1% on average across scientific, legal, and open-domain benchmarks, including 8.06 points on Dragonball and 6.83 on MS MARCO. It also demonstrates superior end-to-end performance with 4.38K online tokens per query and 193 ms latency.

Key takeaway

For Machine Learning Engineers building RAG systems for long documents, SproutRAG offers a compelling solution to enhance retrieval quality and efficiency. You should consider adopting its attention-guided hierarchical indexing and multi-granularity retrieval to overcome limitations of fixed-chunking or LLM-heavy approaches. This framework allows you to achieve superior information efficiency and end-to-end answer quality, particularly for tasks requiring cross-paragraph synthesis, while maintaining low online inference costs.

Key insights

SproutRAG uses learned attention to build a multi-granularity RAG tree, improving retrieval efficiency and coherence without LLM calls.

Principles

Method

Documents are split into sentence chunks, encoded by an SLLM. Learned head-layer attention weights guide bottom-up binary tree construction. Hierarchical beam search retrieves multi-granularity candidates.

In practice

Topics

Code references

Best for: Research Scientist, AI Architect, AI Engineer, AI Scientist, Machine Learning Engineer, NLP Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CL updates on arXiv.org.