Dr-DCI: Scaling Direct Corpus Interaction via Dynamic Workspace Expansion
Summary
The DR-DCI framework introduces a retriever-steered Direct Corpus Interaction (DCI) approach to enhance agentic search over large corpora. Traditional DCI, while offering flexible shell-executable operations, struggles with scalability, becoming slow and unstable on large datasets. DR-DCI addresses this by treating retrieval as an agent-callable action, dynamically pulling relevant documents into an evolving local workspace for DCI operations. This design combines retriever-level recall with DCI-style precision, maintaining scalability while preserving local operations for evidence resolution. Experiments on Browsecomp-Plus show DR-DCI achieves 71.2% accuracy, an 8.3-point improvement over raw DCI and ablated variants, while reducing tool usage, wall time, and estimated cost. With workspace-preserving context reset, accuracy further improves to 73.3%. It remains effective from 100K to 10M documents, outperforming raw DCI and BM25, and scales to a 20M-scale file-per-document Wiki-18 QA setting with an average score of 63.0 across six benchmarks.
Key takeaway
For AI Engineers developing agentic search systems over large document corpora, consider integrating the DR-DCI framework. You can achieve significantly higher accuracy and efficiency compared to traditional DCI or BM25-based methods, especially when scaling to millions of documents. Implement dynamic workspace expansion and agent-callable retrieval actions to maintain performance and stability, potentially improving accuracy by 8.3 points and reaching 73.3% with context reset.
Key insights
DR-DCI scales direct corpus interaction by dynamically expanding a local workspace via retriever-steered agent actions.
Principles
- Combine retriever recall with DCI precision.
- Dynamic workspace expansion improves stability.
- Ranked previews and inter-document DCI are key.
Method
Agents dynamically pull relevant documents into a local workspace using retriever-steered actions, then perform DCI operations within this evolving context.
In practice
- Implement agent-callable retrieval for corpus exploration.
- Utilize workspace-preserving context reset for accuracy gains.
- Design systems with ranked previews for DCI.
Topics
- Agentic Search
- Direct Corpus Interaction
- Dynamic Workspace Expansion
- Information Retrieval
- Corpus Scaling
- Browsecomp-Plus
Best for: Machine Learning Engineer, NLP Engineer, Research Scientist, AI Scientist, AI Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.