MCompassRAG: Topic Metadata as a Semantic Compass for Paragraph-Level Retrieval
Summary
MCompassRAG is a metadata-guided retrieval framework designed to overcome the chunk granularity trade-off in Retrieval-Augmented Generation (RAG) systems, particularly for deep research tasks. It enriches coarse document chunks with topic metadata, using these topic-level signals as a "semantic compass" to guide retrieval. The system trains a lightweight retriever through LLM-teacher distillation, allowing for topic-aware retrieval at inference time without additional LLM calls. This approach significantly improves information efficiency (IE) by 8.24% on average across six complex retrieval benchmarks, while achieving over 5x lower latency compared to leading efficient RAG baselines. MCompassRAG's core innovation lies in making larger chunks more precisely searchable by integrating selected and abstracted topic metadata into the embedding space.
Key takeaway
For Machine Learning Engineers optimizing Retrieval-Augmented Generation (RAG) systems, MCompassRAG presents a compelling solution to the persistent chunk granularity dilemma. If your current RAG implementation suffers from high latency due to inference-time LLM calls or struggles with noisy retrieval from large document chunks, you should investigate integrating topic metadata guidance. This framework significantly boosts information efficiency and reduces latency, offering a path to more precise and cost-effective RAG without sacrificing context. Consider its one-time training cost as an investment for substantial inference-time gains and cross-domain generalizability.
Key insights
Topic metadata, selected and abstracted, can efficiently guide RAG retrieval over coarse chunks, improving precision and latency.
Principles
- Enrich coarse chunks with topic metadata for searchability.
- Distill LLM teacher knowledge into a lightweight retriever.
- Abstract selected topic signals for refined query guidance.
Method
MCompassRAG processes chunks with a topic model, stores distributions in a metadata bank, then at inference, selects and abstracts query-relevant topic metadata to form a topic-aware query vector for MLP-based scoring against enriched chunks.
In practice
- Employ CEMTM with Qwen3-Embedding-4B for topic modeling.
- Train the retriever via distillation on synthetic or general datasets.
- Tune topic model granularity (e.g., K=100) for optimal results.
Topics
- Retrieval-Augmented Generation
- Topic Modeling
- LLM Distillation
- Semantic Retrieval
- Information Efficiency
- Chunk Granularity
Code references
Best for: Research Scientist, AI Engineer, AI Scientist, Machine Learning Engineer, NLP Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CL updates on arXiv.org.