From Procedural Skills to Strategy Genes: Towards Experience-Driven Test-Time Evolution

· Source: Computation and Language · Field: Technology & Digital — Artificial Intelligence & Machine Learning · Depth: Expert, quick

Summary

A new technical report investigates how reusable experience should be represented for effective test-time control and iterative evolution, conducting 4,590 controlled trials across 45 scientific code-solving scenarios. The study finds that "Skill" packages, which are documentation-oriented, provide unstable control due to sparse useful signal, and expanding them often degrades performance. In contrast, a compact "Gene" representation yields the strongest overall average performance, maintains competitiveness under structural perturbations, and outperforms matched-budget Skill fragments. The report also demonstrates that Gene is a superior carrier for iterative experience accumulation, with attached failure history being more effective when distilled into compact warnings rather than naively appended. Gene-evolved systems showed significant improvements on CritPt, increasing performance from 9.1% to 18.57% and from 17.7% to 27.14% over base models.

Key takeaway

For AI Scientists designing systems that learn from experience, focusing on compact, control-oriented representations like the "Gene" model is critical. Your efforts should prioritize distilling failure information into concise warnings rather than appending raw documentation, as this approach significantly improves iterative evolution and overall system performance, as evidenced by gains from 9.1% to 18.57% on CritPt.

Key insights

Compact, control-oriented "Gene" representations are superior for experience reuse and evolution compared to documentation-heavy "Skill" packages.

Principles

Method

The study evaluates experience representations (Skill vs. Gene) through 4,590 controlled trials in 45 code-solving scenarios, assessing their impact on test-time control and iterative evolution.

In practice

Topics

Best for: AI Scientist, Machine Learning Engineer, Research Scientist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Computation and Language.