Harness-1: Reinforcement Learning for Search Agents with State-Externalizing Harnesses
Summary
Harness-1 is a 20B reinforcement learning search agent designed to improve retrieval performance by externalizing routine state management. Traditional search agents often burden their policies with both semantic search decisions and recoverable bookkeeping. Harness-1 addresses this by employing a stateful search harness that maintains environment-side working memory, including a candidate pool, importance-tagged curated sets, compact evidence links, verification records, and budget-aware context rendering. This allows the policy to focus solely on semantic decisions like what to search, which documents to keep, and when to stop. Across eight diverse retrieval benchmarks, including web, finance, patents, and multi-hop QA, Harness-1 achieved an average curated recall of 0.730. This performance represents an 11.4-point improvement over the next strongest open search subagent and demonstrates strong generalization capabilities on held-out transfer benchmarks. Its code is publicly available.
Key takeaway
For machine learning engineers developing search or retrieval agents, Harness-1 demonstrates a critical architectural shift. You should consider externalizing routine state management from your RL policies into a dedicated harness. This approach frees the policy to focus on semantic decisions, potentially yielding significant gains in curated recall and improving generalization across diverse domains like web, finance, and multi-hop QA. Explore the provided code to adapt this state-externalizing paradigm.
Key insights
Harness-1 improves RL search agents by offloading routine state management to an external harness, boosting recall and generalization.
Principles
- Offload routine state management from RL policies.
- Explicit search state improves generalization.
Method
Train a 20B RL search agent with a stateful harness that manages environment-side working memory, including candidate pools and verification records, allowing the policy to focus on semantic search decisions.
In practice
- Externalize state for complex RL agents.
- Test search agents on transfer benchmarks.
Topics
- Reinforcement Learning
- Search Agents
- Retrieval Systems
- State Externalization
- Working Memory
- Multi-hop QA
Code references
Best for: Research Scientist, AI Architect, NLP Engineer, AI Scientist, Machine Learning Engineer, AI Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.