Dr-DCI: Scaling Direct Corpus Interaction via Dynamic Workspace Expansion

· Source: Artificial Intelligence · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Robotics & Autonomous Systems · Depth: Expert, quick

Summary

The DR-DCI framework introduces a retriever-steered Direct Corpus Interaction (DCI) approach to enhance agentic search over large corpora. Traditional DCI, while offering flexible shell-executable operations, struggles with scalability, becoming slow and unstable on large datasets. DR-DCI addresses this by treating retrieval as an agent-callable action, dynamically pulling relevant documents into an evolving local workspace for DCI operations. This design combines retriever-level recall with DCI-style precision, maintaining scalability while preserving local operations for evidence resolution. Experiments on Browsecomp-Plus show DR-DCI achieves 71.2% accuracy, an 8.3-point improvement over raw DCI and ablated variants, while reducing tool usage, wall time, and estimated cost. With workspace-preserving context reset, accuracy further improves to 73.3%. It remains effective from 100K to 10M documents, outperforming raw DCI and BM25, and scales to a 20M-scale file-per-document Wiki-18 QA setting with an average score of 63.0 across six benchmarks.

Key takeaway

For AI Engineers developing agentic search systems over large document corpora, consider integrating the DR-DCI framework. You can achieve significantly higher accuracy and efficiency compared to traditional DCI or BM25-based methods, especially when scaling to millions of documents. Implement dynamic workspace expansion and agent-callable retrieval actions to maintain performance and stability, potentially improving accuracy by 8.3 points and reaching 73.3% with context reset.

Key insights

DR-DCI scales direct corpus interaction by dynamically expanding a local workspace via retriever-steered agent actions.

Principles

Method

Agents dynamically pull relevant documents into a local workspace using retriever-steered actions, then perform DCI operations within this evolving context.

In practice

Topics

Best for: Machine Learning Engineer, NLP Engineer, Research Scientist, AI Scientist, AI Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.