SkillVetBench: LLM-as-Judge for Multi-Dimensional Security Risk Evaluation in Open-Source LLM Agent Skills
Summary
SKILLVETBENCH is a new live public leaderboard on Hugging Face designed to vet the security of open-source LLM agent skills using an LLM-as-Judge approach. It addresses a critical gap where existing code-layer scanners fail to detect instruction-layer and multi-agent risks, such as natural-language directives that hijack agents or exfiltrate data. The system introduces SARS (Skill Agentic Risk Score), a five-dimensional agentic-risk metric with a principled weighted formula for instruction-following systems. SKILLVETBENCH integrates full CVSS v4.0 vector decomposition and a ClawHub dual-view. Demonstrations show the LLM-as-Judge achieves zero false negatives across 78 confirmed-malicious skills and zero false positives across 22 benign controls, significantly outperforming static baselines like SKILLSIEVE, which misses 15%. Conventional tools miss 89% to 100% of instruction-layer threats like Prompt Injection and Memory Poisoning. Detection rates vary from 35% to 95% across four LLM evaluators, motivating ensemble scoring.
Key takeaway
For AI Security Engineers or teams deploying LLM agents, traditional code-layer security scanners are inadequate for vetting open-source skills, missing up to 100% of instruction-layer threats. You should integrate semantic, multi-dimensional vetting systems like SKILLVETBENCH's LLM-as-Judge approach. This shift ensures comprehensive detection of risks such as prompt injection and memory poisoning, significantly improving agent security. Consider implementing ensemble scoring for LLM evaluators to enhance reliability and coverage.
Key insights
LLM-as-Judge effectively vets open-source LLM agent skills for multi-dimensional security risks, surpassing traditional code-layer scanners.
Principles
- Semantic vetting is crucial for instruction-layer and multi-agent risks.
- LLM-as-Judge can achieve zero false negatives in security evaluation.
- Ensemble scoring improves LLM evaluator detection rates.
Method
SKILLVETBENCH employs an LLM-as-Judge to evaluate agent skills, utilizing SARS, a five-dimensional agentic-risk metric, and integrating CVSS v4.0 vector decomposition for comprehensive risk assessment.
In practice
- Adopt LLM-as-Judge for agent skill security vetting.
- Implement multi-dimensional risk metrics like SARS.
- Deploy ensemble scoring for LLM-based security evaluators.
Topics
- LLM Agents
- Security Risk Evaluation
- Open-Source LLMs
- LLM-as-Judge
- Prompt Injection
- Memory Poisoning
- CVSS v4.0
Best for: CTO, VP of Engineering/Data, Director of AI/ML, AI Scientist, AI Security Engineer, Machine Learning Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.