HARBOR: Automated Harness Optimization

· Source: cs.LG updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Software Development & Engineering · Depth: Expert, medium

Summary

Harbor introduces an automated approach to optimizing the "harness" that wraps large language models (LLMs) in long-horizon agent systems. This harness, comprising elements like context compaction, tool caching, and semantic memory, constitutes the majority of an agent's codebase (e.g., ~98.4% for Claude Code). The paper formalizes automated harness optimization (AHO) as a constrained noisy Bayesian optimization problem over a mixed-variable, cost-heterogeneous configuration space. It proposes Harbor (Harness Axis-aligned Regularized Bayesian Optimization Routine) as a reference solver, utilizing a block-additive SAAS surrogate, multi-fidelity cost-aware acquisition, and TuRBO trust regions. A case study on a production coding agent, codex-py, demonstrated that manual tuning over four rounds yielded only one statistically credible net win (17/89 vs. 15/89 baseline), while an Oracle (best-of-all-configurations union) achieved 81/89, highlighting the limitations of manual approaches.

Key takeaway

For NLP Engineers and Research Scientists developing long-horizon LLM agents, relying solely on manual harness tuning is inefficient and suboptimal. You should explore automated harness optimization (AHO) frameworks like Harbor to systematically discover optimal configurations, especially when dealing with complex, flag-gated feature spaces. This shift can significantly improve agent performance and reduce the time spent on iterative, error-prone manual adjustments.

Key insights

Automated optimization of LLM agent harnesses is crucial for performance, outperforming manual tuning significantly.

Principles

Method

Harbor formalizes AHO as constrained noisy Bayesian optimization, using a block-additive SAAS surrogate, multi-fidelity cost-aware acquisition, and TuRBO trust regions for efficient search.

In practice

Topics

Best for: NLP Engineer, Research Scientist, AI Scientist, Machine Learning Engineer, AI Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.LG updates on arXiv.org.