Retrospective Harness Optimization: Improving LLM Agents via Self-Preference over Trajectory Rollouts

2026-06-06 · Source: cs.AI updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Robotics & Autonomous Systems, Software Development & Engineering · Depth: Expert, extended

Summary

Retrospective Harness Optimization (RHO) is a self-supervised method designed to improve AI agent performance by optimizing its "harness" (skills, tools, workflows) using only past trajectories, eliminating the need for ground-truth validation sets. RHO selects a diverse coreset of challenging tasks from past experiences, re-solves them in parallel, and diagnoses failures using self-validation and self-consistency signals. It then generates candidate harness updates and selects the most effective one through pairwise self-preference. Evaluated across software engineering, technical work, and knowledge work domains, RHO notably improved the pass rate on SWE-Bench Pro from 59% to 78% in a single optimization round without external grading. The optimized harness alters agent behavior, targeting prior failure modes and sustaining higher accuracy in long-horizon sessions.

Key takeaway

For AI engineers deploying LLM agents in dynamic environments, you should consider implementing self-supervised harness optimization to continuously improve agent performance without relying on costly labeled validation data. By retrospectively analyzing past agent trajectories, you can identify and address failure modes, leading to more robust and accurate long-horizon task execution. Ensure audit logs are maintained and human approval is required for sensitive harness edits to mitigate risks of amplifying mistaken preferences or unsafe procedures.

Key insights

AI agents can self-improve their operational harness by retrospectively analyzing past unlabeled trajectories.

Principles

Harness optimization benefits from balancing task difficulty and diversity.
Self-validation and self-consistency signals are crucial for effective diagnosis.
Pairwise self-preference can reliably select effective harness updates.

Method

RHO selects a diverse, challenging coreset of past tasks, generates parallel rollouts, extracts self-validation and self-consistency signals, then proposes and selects the best harness update via self-preference.

In practice

Implement a Determinantal Point Process (DPP) for coreset selection.
Use parallel rollouts to generate diagnostic signals.
Employ agent self-preference for selecting harness updates.

Topics

LLM Agents
Harness Optimization
Self-Supervised Learning
Trajectory Analysis
SWE-Bench Pro
Determinantal Point Process

Code references

wbopan/retro-harness

Best for: Research Scientist, AI Scientist, Machine Learning Engineer, AI Engineer

Related on AIssential

See Counsel's argued verdicts on the open AI decisions leaders are weighing →

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.