Retrospective Harness Optimization: Improving LLM Agents via Self-Preference over Trajectory Rollouts

2026-06-04 · Source: Artificial Intelligence · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Software Development & Engineering · Depth: Expert, quick

Summary

Retrospective Harness Optimization (RHO) is a novel self-supervised method designed to improve AI agent harnesses by leveraging only past trajectories, addressing the challenge of acquiring ground-truth validation data. RHO operates by selecting a diverse coreset of challenging tasks from an agent's historical performance and re-solving them in parallel. The agent then analyzes these rollouts using self-validation and self-consistency mechanisms, generating candidate harness updates. The most effective update is chosen through the agent's own pairwise self-preference. Evaluated across software engineering, technical work, and knowledge work domains, RHO demonstrated significant improvements, notably boosting the pass rate on SWE-Bench Pro from 59% to 78% in a single optimization round, without requiring external grading. This optimization effectively targets prior failure modes, leading to altered agent behavior patterns and sustained higher accuracy during long-horizon sessions.

Key takeaway

For AI Engineers deploying LLM agents in environments lacking ground-truth validation data, Retrospective Harness Optimization (RHO) provides a critical self-supervised improvement pathway. You can leverage your agent's past trajectories to autonomously identify and rectify failure modes, significantly boosting performance without external grading. Consider integrating RHO to enable continuous, adaptive agent improvement, ensuring higher accuracy and more robust behavior in long-horizon operational sessions.

Key insights

Retrospective Harness Optimization (RHO) enables self-supervised AI agent improvement using past trajectories and self-preference, bypassing external validation.

Principles

Self-supervision can optimize agent performance.
Past failures offer valuable optimization data.
Diverse task coresets improve learning.

Method

RHO selects challenging past tasks, re-solves them, uses self-validation and self-consistency, then applies self-preference to choose harness updates.

In practice

Optimize agents without labeled validation sets.
Improve agent pass rates on complex tasks.
Target specific agent failure modes.

Topics

LLM Agents
Self-Supervised Learning
Harness Optimization
Trajectory Rollouts
Self-Preference
SWE-Bench Pro

Best for: Research Scientist, AI Scientist, Machine Learning Engineer, AI Engineer

Related on AIssential

See Counsel's argued verdicts on the open AI decisions leaders are weighing →

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.