Multi-Turn Evaluation of Deep Research Agents Under Process-Level Feedback

· Source: Machine Learning · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Robotics & Autonomous Systems · Depth: Expert, quick

Summary

A multi-turn evaluation of Deep Research Agents (DRAs) investigates their ability to improve reports with feedback, moving beyond single-shot output assessments. Researchers conducted tests under self-reflection and process-level feedback, designing Research Gap Inference (RGI) to infer research-process gaps from rubric criteria. Findings published on 2026-06-08 reveal that self-reflection yields negligible net improvement, with agents incorporating and regressing on rubric criteria at similar rates. Conversely, a single round of process-level feedback provides substantial gains, increasing normalized scores by approximately 8-15 points and achieving a 35-40% incorporation rate. However, these gains do not compound; subsequent turns show agents regressing on up to 24% of previously satisfied criteria. This indicates that reliable multi-turn improvement remains elusive for current DRA architectures. Code and results are publicly available.

Key takeaway

For Machine Learning Engineers developing Deep Research Agents, understand that initial process-level feedback improves report quality, but current architectures struggle with compounding gains. You should prioritize single-round, targeted feedback mechanisms and design systems that minimize regression on previously satisfied criteria. Avoid complex multi-turn feedback loops until agents demonstrate robust, non-regressive learning capabilities.

Key insights

Deep Research Agents show initial gains from process-level feedback but struggle with sustained multi-turn improvement due to regression.

Principles

Method

Research Gap Inference (RGI) analyzes rubric criteria satisfaction patterns to infer research-process gaps for Deep Research Agents.

In practice

Topics

Code references

Best for: Research Scientist, AI Scientist, Machine Learning Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Machine Learning.