Skills on the Fly: Test-Time Adaptive Skill Synthesis for LLM Agents

· Source: cs.CL updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Robotics & Autonomous Systems · Depth: Expert, extended

Summary

SkillTTA is a Test-Time Adaptive Skill Synthesis method for LLM agents that dynamically generates task-specific textual skills. Instead of relying on static skill libraries or iterative parameter updates, SkillTTA retrieves a small set of relevant training trajectories for a given test task and synthesizes them into a temporary skill. This skill, formatted as SKILL.md, is then injected into the solver's prompt, with the solver model remaining fixed. The method was evaluated on SpreadsheetBench, ALFWorld, and BigCodeBench, demonstrating significant performance improvements. For instance, SpreadsheetBench Pass@1 increased from 0.397 to 0.505, and BigCodeBench Pass@1 improved from 0.517 to 0.651 when using GPT-5.5 for synthesis. Ablation studies revealed that synthesized skills outperform raw trajectory prompting, that a small top-$k$ retrieval size is optimal, and that failed trajectories are particularly valuable for exposing common evaluator-facing mistakes.

Key takeaway

For NLP Engineers developing LLM agents, consider implementing dynamic skill synthesis to enhance task performance. SkillTTA's approach of generating task-specific skills from retrieved trajectories, especially failed ones, can significantly improve accuracy on benchmarks like SpreadsheetBench and BigCodeBench. This method offers a lightweight, parameter-free alternative to static skill libraries or costly iterative adaptation, allowing your agents to specialize behavior efficiently at test time.

Key insights

Dynamically synthesizing task-specific skills from retrieved trajectories improves LLM agent performance without parameter updates.

Principles

Method

SkillTTA builds a trajectory pool, embeds task metadata, retrieves top-$k$ related trajectories at test time, and synthesizes them into a temporary SKILL.md for the fixed solver.

In practice

Topics

Best for: Research Scientist, NLP Engineer, AI Scientist, Machine Learning Engineer, AI Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CL updates on arXiv.org.