When Safe Skills Collide: Measuring Compositional Risk in Agent Skill Ecosystems
Summary
SkillReact, a compositional security measurement framework, addresses the risk of individually safe LLM agent skills combining into unsafe installed skill sets. Applied to 1,520 ClawHub skills, where 651 passed individual inspection and formed 211,575 pairs, the benchmark flagged 22.25% as structural candidates. A human-adjudicated audit revealed that 18.2% of these flagged pairs represented genuine compositional risks, implying approximately 14,000 hidden risk memberships in a single registry that per-skill scanning misses. An action-based harness further demonstrated that the realization of these risks into model-issued tool calls is gated by the host model's disposition: Haiku-4-5 issued dropper-stage tool calls in all 39 direct-prompt trials (36 full chain, 3 download-only), Opus-4-7 stopped at download, and Sonnet-4-6 refused. This highlights the need for install-time compositional checks and capability isolation.
Key takeaway
For AI Security Engineers deploying LLM agents that integrate community-contributed skills, you must implement install-time compositional security checks. Relying solely on per-skill safety scans leaves your systems vulnerable to emergent risks from skill combinations. Proactively audit your skill registries for hidden compositional vulnerabilities and consider capability isolation to mitigate host model disposition risks.
Key insights
Individually safe agent skills can compose into unsafe installed skill sets, posing a significant compositional security risk.
Principles
- Per-skill safety scanning is insufficient.
- Host model disposition gates exploit realization.
- Composition fixes reachable capabilities.
Method
SkillReact uses a static-composition benchmark, LLM-assisted human adjudication, and an action-based exploitability harness to measure compositional security.
In practice
- Implement install-time compositional checks.
- Apply capability isolation for agent skills.
- Audit skill registries for hidden risks.
Topics
- LLM Agents
- Compositional Security
- AI Safety
- Skill Ecosystems
- SkillReact
- Vulnerability Measurement
Best for: CTO, AI Architect, Research Scientist, AI Scientist, AI Engineer, AI Security Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.