MAGE-RAG: Multigranular Adaptive Graph Evidence for Agentic Multimodal RAG in Long-Document QA

2026-06-14 · Source: Artificial Intelligence · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics · Depth: Expert, quick

Summary

MAGE-RAG is a multigranular adaptive graph evidence framework designed for agentic multimodal RAG in long-document question answering. It addresses limitations of existing RAG methods that struggle with locating sparse evidence across text, tables, images, charts, and complex layouts in long PDFs, often leading to static trade-offs between evidence coverage, noise, and inference cost. MAGE-RAG uses page retrieval as an entry point, building an offline evidence graph with page and element nodes encoding various relations like containment, reading order, and semantic neighbors. At query time, an online evidence controller iteratively activates, opens, searches, and prunes evidence under explicit budgets, rendering a compact, relevant evidence subgraph for the Large Vision-Language Model (LVLM). Experiments show MAGE-RAG achieves 52.75 overall accuracy on LongDocURL and 53.26 accuracy with 51.19 F1 on MMLongBench-Doc, demonstrating improved balance between dispersed evidence coverage and context-noise control.

Key takeaway

For Machine Learning Engineers developing multimodal RAG systems for long documents, MAGE-RAG offers a robust approach to overcome context limitations and noise. You should consider implementing an adaptive graph-based evidence construction strategy to dynamically balance evidence coverage with inference costs. This method allows your LVLMs to consume compact, relevant information, significantly improving accuracy on complex QA tasks involving diverse document elements like text, tables, and images.

Key insights

MAGE-RAG uses an adaptive evidence graph and query-time control for efficient multimodal RAG in long documents.

Principles

Multigranular evidence graphs improve context relevance.
Adaptive evidence construction balances coverage and noise.
Page retrieval can serve as an effective entry point.

Method

MAGE-RAG builds an offline evidence graph (page/element nodes, various relations). An online controller then iteratively activates, opens, searches, and prunes evidence under budget constraints to form a subgraph.

In practice

Implement graph-based RAG for complex PDF QA.
Design adaptive evidence controllers for budget management.
Integrate page-level visual retrieval as a RAG starting point.

Topics

Multimodal RAG
Long-Document QA
Evidence Graphs
Large Vision-Language Models
Information Retrieval
Adaptive Retrieval

Code references

laonuo2004/MAGE-RAG

Best for: Research Scientist, AI Engineer, AI Scientist, Machine Learning Engineer, NLP Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.