Sarang Kulkarni on Lessons from Building Deep Research Agents in Production
Summary
Sarang Kulkarni from Thoughtworks presented at the Arc of AI Conference 2026 on lessons from designing and deploying Deep Research Agents in production, particularly for healthcare and pharmaceutical R&D. These AI agents conduct multi-step internet research, employing dynamic reasoning and multi-hop information retrieval to generate comprehensive analytical reports. Kulkarni highlighted the \$2.6B cost to bring a new drug to market and the challenge of accessing existing knowledge. His team evolved from a RAG chatbot to an Agentic RAG++ system, featuring clarification, research, and writing loops. Key components include a RAG tool with weighted hybrid search and a text2sql tool. He addressed failure modes like high token cost, latency, and "context anxiety," proposing solutions such as explicit think-act loops and harness engineering to enhance agent reliability and accountability.
Key takeaway
For AI Engineers developing multi-agent systems in critical domains like R&D, you should prioritize designing explicit think-act loops and robust reflection mechanisms to manage long-horizon tasks and ensure data completeness. Implement harness engineering principles to build reliable, accountable agents, focusing on tool design, memory systems, and validation checks. This approach will mitigate issues like "context anxiety" and improve the accuracy of complex, multi-step research outputs, accelerating your project timelines.
Key insights
Deep Research Agents leverage multi-loop architectures and harness engineering to perform complex, reliable internet research for critical industries.
Principles
- Long-horizon tasks require explicit think-act loops.
- Process reflection enhances agent completeness and synthesis.
- Better AI models reduce the complexity of the agent harness.
Method
The Deep Research Agent system integrates clarification, research (think, plan, execute, reflect, adjust), and writing loops (write, reflect), utilizing RAG and text2sql tools with reflection for error correction and synthesis.
In practice
- Implement weighted hybrid search for RAG tools.
- Incorporate reflection loops for data completeness and process assessment.
- Design explicit think-act loops for multi-step tasks.
Topics
- Deep Research Agents
- Multi-Agent Systems
- Retrieval-Augmented Generation
- Harness Engineering
- Healthcare R&D
- AI Agent Reliability
Best for: AI Architect, MLOps Engineer, AI Scientist, AI Engineer, Machine Learning Engineer, Research Scientist
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by InfoQ.