From Procedural Skills to Strategy Genes: Towards Experience-Driven Test-Time Evolution
Summary
A new technical report investigates how reusable experience should be represented for effective test-time control and iterative evolution, conducting 4,590 controlled trials across 45 scientific code-solving scenarios. The study finds that "Skill" packages, which are documentation-oriented, provide unstable control due to sparse useful signal, and expanding them often degrades performance. In contrast, a compact "Gene" representation yields the strongest overall average performance, maintains competitiveness under structural perturbations, and outperforms matched-budget Skill fragments. The report also demonstrates that Gene is a superior carrier for iterative experience accumulation, with attached failure history being more effective when distilled into compact warnings rather than naively appended. Gene-evolved systems showed significant improvements on CritPt, increasing performance from 9.1% to 18.57% and from 17.7% to 27.14% over base models.
Key takeaway
For AI Scientists designing systems that learn from experience, focusing on compact, control-oriented representations like the "Gene" model is critical. Your efforts should prioritize distilling failure information into concise warnings rather than appending raw documentation, as this approach significantly improves iterative evolution and overall system performance, as evidenced by gains from 9.1% to 18.57% on CritPt.
Key insights
Compact, control-oriented "Gene" representations are superior for experience reuse and evolution compared to documentation-heavy "Skill" packages.
Principles
- Representation is a first-order factor in experience reuse.
- Compact, control-oriented objects are key for effective evolution.
- Distilled warnings are more useful than raw failure history.
Method
The study evaluates experience representations (Skill vs. Gene) through 4,590 controlled trials in 45 code-solving scenarios, assessing their impact on test-time control and iterative evolution.
In practice
- Prioritize compact, control-oriented experience encoding.
- Distill failure information into concise warnings.
- Consider "Gene" structures for iterative system improvement.
Topics
- Experience Representation
- Test-Time Evolution
- Gene Representation
- Skill Packages
- Iterative Learning
Best for: AI Scientist, Machine Learning Engineer, Research Scientist
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Computation and Language.