Lost in a Single Vector: Improving Long-Document Retrieval with Chunk Evidence Aggregation

2026-06-18 · Source: cs.CL updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics · Depth: Expert, extended

Summary

DICE (Document Inference via Chunk Evidence) is a novel, training-free strategy designed to improve dense retrieval performance on long documents by addressing "document-side early compression." This failure mode occurs when crucial, localized evidence within a long document is diluted during its encoding into a single vector. DICE mitigates this by splitting documents into chunks, encoding each independently with a frozen model and local position indices, then aggregating these chunk embeddings into a single document vector. This approach maintains the standard one-query-one-document retrieval interface. Evaluated on LongEmbed across Dream, Mistral, Llama3, and Qwen backbones, DICE significantly boosts retrieval, particularly for contexts beyond 4k tokens. For Dream, Passkey >4k scores increased from 30.0 to 90.0, and Needle >4k from 23.3 to 74.0. DICE also reduced the Evidence Dilution Index (EDI) in 92.8% of 12,779 samples.

Key takeaway

For Machine Learning Engineers optimizing long-document retrieval systems, you should consider implementing DICE to significantly enhance performance without retraining models. If your system struggles with "document-side early compression" on documents exceeding 4k tokens, adopting DICE's chunking and aggregation strategy can dramatically improve recall. Be prepared for a 3-4x increase in document-side encoding cost, but this trade-off is often acceptable for offline indexing where documents are queried frequently.

Key insights

Document-side chunk aggregation with local position encoding prevents evidence dilution in long-document dense retrieval.

Principles

Localized evidence is diluted by single-vector compression.
Delaying compression improves long-document retrieval.
Chunk granularity is a decisive design factor.

Method

DICE splits documents into token chunks, encodes each independently with local position indices using a frozen model, then aggregates chunk embeddings into a single document vector via query-independent pooling (e.g., mean pooling).

In practice

Use chunk size 1024 for optimal average performance.
Prefer mean pooling for robust aggregation across tasks.
Reset position indices locally within each chunk.

Topics

Long-Document Retrieval
Dense Retrieval
Chunk Evidence Aggregation
Evidence Dilution Index
DICE Algorithm
LLM Embedders
LongEmbed Benchmark

Code references

PunchlineAAAA/DICE

Best for: Research Scientist, AI Architect, AI Engineer, AI Scientist, Machine Learning Engineer, NLP Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CL updates on arXiv.org.