Retrospective Harness Optimization: Improving LLM Agents via Self-Preference over Trajectory Rollouts
Summary
Retrospective Harness Optimization (RHO) is a novel self-supervised method designed to improve AI agent harnesses by leveraging only past trajectories, addressing the challenge of acquiring ground-truth validation data. RHO operates by selecting a diverse coreset of challenging tasks from an agent's historical performance and re-solving them in parallel. The agent then analyzes these rollouts using self-validation and self-consistency mechanisms, generating candidate harness updates. The most effective update is chosen through the agent's own pairwise self-preference. Evaluated across software engineering, technical work, and knowledge work domains, RHO demonstrated significant improvements, notably boosting the pass rate on SWE-Bench Pro from 59% to 78% in a single optimization round, without requiring external grading. This optimization effectively targets prior failure modes, leading to altered agent behavior patterns and sustained higher accuracy during long-horizon sessions.
Key takeaway
For AI Engineers deploying LLM agents in environments lacking ground-truth validation data, Retrospective Harness Optimization (RHO) provides a critical self-supervised improvement pathway. You can leverage your agent's past trajectories to autonomously identify and rectify failure modes, significantly boosting performance without external grading. Consider integrating RHO to enable continuous, adaptive agent improvement, ensuring higher accuracy and more robust behavior in long-horizon operational sessions.
Key insights
Retrospective Harness Optimization (RHO) enables self-supervised AI agent improvement using past trajectories and self-preference, bypassing external validation.
Principles
- Self-supervision can optimize agent performance.
- Past failures offer valuable optimization data.
- Diverse task coresets improve learning.
Method
RHO selects challenging past tasks, re-solves them, uses self-validation and self-consistency, then applies self-preference to choose harness updates.
In practice
- Optimize agents without labeled validation sets.
- Improve agent pass rates on complex tasks.
- Target specific agent failure modes.
Topics
- LLM Agents
- Self-Supervised Learning
- Harness Optimization
- Trajectory Rollouts
- Self-Preference
- SWE-Bench Pro
Best for: Research Scientist, AI Scientist, Machine Learning Engineer, AI Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.