Skills on the Fly: Test-Time Adaptive Skill Synthesis for LLM Agents
Summary
SkillTTA is a Test-Time Adaptive Skill Synthesis method for LLM agents that dynamically generates task-specific textual skills. Instead of relying on static skill libraries or iterative parameter updates, SkillTTA retrieves a small set of relevant training trajectories for a given test task and synthesizes them into a temporary skill. This skill, formatted as SKILL.md, is then injected into the solver's prompt, with the solver model remaining fixed. The method was evaluated on SpreadsheetBench, ALFWorld, and BigCodeBench, demonstrating significant performance improvements. For instance, SpreadsheetBench Pass@1 increased from 0.397 to 0.505, and BigCodeBench Pass@1 improved from 0.517 to 0.651 when using GPT-5.5 for synthesis. Ablation studies revealed that synthesized skills outperform raw trajectory prompting, that a small top-$k$ retrieval size is optimal, and that failed trajectories are particularly valuable for exposing common evaluator-facing mistakes.
Key takeaway
For NLP Engineers developing LLM agents, consider implementing dynamic skill synthesis to enhance task performance. SkillTTA's approach of generating task-specific skills from retrieved trajectories, especially failed ones, can significantly improve accuracy on benchmarks like SpreadsheetBench and BigCodeBench. This method offers a lightweight, parameter-free alternative to static skill libraries or costly iterative adaptation, allowing your agents to specialize behavior efficiently at test time.
Key insights
Dynamically synthesizing task-specific skills from retrieved trajectories improves LLM agent performance without parameter updates.
Principles
- Adapt textual policy, not model weights.
- Synthesize skills after seeing target task metadata.
- Failed trajectories provide high-value corrective evidence.
Method
SkillTTA builds a trajectory pool, embeds task metadata, retrieves top-$k$ related trajectories at test time, and synthesizes them into a temporary SKILL.md for the fixed solver.
In practice
- Use small top-$k$ for trajectory retrieval.
- Prioritize failed trajectories for skill synthesis.
- Inject synthesized skills as text into solver prompts.
Topics
- LLM Agents
- Test-Time Adaptive Skill Synthesis
- Trajectory Retrieval
- Parameter-Free Adaptation
- SpreadsheetBench
Best for: Research Scientist, NLP Engineer, AI Scientist, Machine Learning Engineer, AI Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CL updates on arXiv.org.