The Scaling Laws of Skills in LLM Agent Systems

· Source: cs.CL updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Robotics & Autonomous Systems · Depth: Expert, extended

Summary

A study across 15 frontier LLMs, 1,141 real-world skills, and over 3 million routing decisions identifies two coupled scaling laws governing skill accumulation in LLM agent systems. The "routing law" states that single-step routing accuracy decays logarithmically with library size, following $Acc(N)=a-b\ln N$ ($R^{2}{>}0.97$). Errors progress from local skill competition to cross-family drift and "black-hole skill" capture. The "execution law" reveals that while joint routing is approximately multiplicative before state realization, correct execution can improve difficult downstream decisions by approximately $4\times$. A single parameter, the routing logarithmic decay slope $b$, couples these laws, predicting execution-side rescue across models. Law-guided optimization, including nearest-neighbor auditing, description-boundary rewriting, abstract-skill removal, and prompt anchoring, improved held-out routing accuracy from 71.3% to 91.7% and reduced hijack from 22.4% to 4.1%. These optimizations also directionally transferred to downstream ClawBench and ClawMark execution settings, increasing mean pass rates from 49.3% to 61.6% and 28.4% to 34.5%, respectively.

Key takeaway

For AI Engineers developing LLM agent systems, understanding the scaling laws of skill libraries is crucial. You should actively manage your skill library's structure, granularity, and exposure policies, rather than solely focusing on model capabilities. Implement strategies like boundary rewriting and abstract skill removal to mitigate logarithmic accuracy decay and prevent "black-hole skill" capture, thereby improving both routing accuracy and downstream task execution.

Key insights

LLM agent performance depends on skill library structure, granularity, and exposure policy, not just model capability.

Principles

Method

The study empirically derives two coupled scaling laws for LLM agent skill systems by analyzing routing and execution decisions across diverse LLMs and a large skill library, then validates these laws through targeted optimizations.

In practice

Topics

Best for: Research Scientist, AI Engineer, AI Scientist, Machine Learning Engineer, AI Architect

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CL updates on arXiv.org.