SkillHone: A Harness for Continual Agent Skill Evolution Through Persistent Decision History

2026-06-07 · Source: Machine Learning · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Robotics & Autonomous Systems · Depth: Expert, quick

Summary

SkillHone is a novel harness designed for the continual evolution of language-model agent skills, addressing the limitation of existing methods that discard crucial decision history. It facilitates cross-session refinement by pairing skill revisions with evaluation-side evidence, recording structured histories of diagnoses, revisions, evidence, and outcomes. This system employs role-separated subagents to test candidate skills on practice probes, proposing revisions informed by past decisions and eliminating the need to rediscover prior rationale. Evaluated on deep-research benchmarks in a raw open-web setting, SkillHone, utilizing Qwen3.6-35B-A3B as its backbone, significantly outperforms a deep-research agent backed by commercial retrieval services. It achieved a 15.8-point improvement on GAIA and a 3.2-point improvement on WebWalkerQA-EN, also surpassing previous skill-evolution techniques.

Key takeaway

For AI Engineers developing continually evolving agents, SkillHone offers a robust framework to overcome the challenge of lost decision history. You should consider implementing persistent decision logging and structured feedback mechanisms to enable agents to learn from past revisions and evaluations. This approach prevents redundant rationale discovery, significantly improving agent performance on complex, dynamic tasks like open-web research.

Key insights

SkillHone enables continuous agent skill evolution by persistently recording and utilizing decision history and evaluation feedback.

Principles

Preserve decision history for agent learning.
Integrate evaluation feedback directly into revisions.
Use role-separated subagents for refinement.

Method

SkillHone records structured histories of diagnoses, revisions, evidence, and outcomes. Role-separated subagents run candidate skills on practice probes, proposing revisions informed by prior decisions for cross-session refinement.

In practice

Implement persistent decision logging for agents.
Design subagents for iterative skill refinement.
Integrate evaluation feedback into agent training loops.

Topics

Agent Skill Evolution
Language Model Agents
Decision History
Continual Learning
Open-Web Research
GAIA Benchmark
WebWalkerQA-EN

Best for: Research Scientist, AI Scientist, AI Engineer, Machine Learning Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Machine Learning.