Skills on the Fly: Test-Time Adaptive Skill Synthesis for LLM Agents

2026-05-19 · Source: cs.CL updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Robotics & Autonomous Systems · Depth: Expert, extended

Summary

SkillTTA is a Test-Time Adaptive Skill Synthesis method for LLM agents that dynamically generates task-specific textual skills. Instead of relying on static skill libraries or iterative parameter updates, SkillTTA retrieves a small set of relevant training trajectories for a given test task and synthesizes them into a temporary skill. This skill, formatted as SKILL.md, is then injected into the solver's prompt, with the solver model remaining fixed. The method was evaluated on SpreadsheetBench, ALFWorld, and BigCodeBench, demonstrating significant performance improvements. For instance, SpreadsheetBench Pass@1 increased from 0.397 to 0.505, and BigCodeBench Pass@1 improved from 0.517 to 0.651 when using GPT-5.5 for synthesis. Ablation studies revealed that synthesized skills outperform raw trajectory prompting, that a small top-$k$ retrieval size is optimal, and that failed trajectories are particularly valuable for exposing common evaluator-facing mistakes.

Key takeaway

For NLP Engineers developing LLM agents, consider implementing dynamic skill synthesis to enhance task performance. SkillTTA's approach of generating task-specific skills from retrieved trajectories, especially failed ones, can significantly improve accuracy on benchmarks like SpreadsheetBench and BigCodeBench. This method offers a lightweight, parameter-free alternative to static skill libraries or costly iterative adaptation, allowing your agents to specialize behavior efficiently at test time.

Key insights

Dynamically synthesizing task-specific skills from retrieved trajectories improves LLM agent performance without parameter updates.

Principles

Adapt textual policy, not model weights.
Synthesize skills after seeing target task metadata.
Failed trajectories provide high-value corrective evidence.

Method

SkillTTA builds a trajectory pool, embeds task metadata, retrieves top-$k$ related trajectories at test time, and synthesizes them into a temporary SKILL.md for the fixed solver.

In practice

Use small top-$k$ for trajectory retrieval.
Prioritize failed trajectories for skill synthesis.
Inject synthesized skills as text into solver prompts.

Topics

LLM Agents
Test-Time Adaptive Skill Synthesis
Trajectory Retrieval
Parameter-Free Adaptation
SpreadsheetBench

Best for: Research Scientist, NLP Engineer, AI Scientist, Machine Learning Engineer, AI Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CL updates on arXiv.org.