SkillVetBench: LLM-as-Judge for Multi-Dimensional Security Risk Evaluation in Open-Source LLM Agent Skills

2026-06-14 · Source: Artificial Intelligence · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Cybersecurity & Data Privacy, Robotics & Autonomous Systems · Depth: Expert, quick

Summary

SKILLVETBENCH is a new live public leaderboard on Hugging Face designed to vet the security of open-source LLM agent skills using an LLM-as-Judge approach. It addresses a critical gap where existing code-layer scanners fail to detect instruction-layer and multi-agent risks, such as natural-language directives that hijack agents or exfiltrate data. The system introduces SARS (Skill Agentic Risk Score), a five-dimensional agentic-risk metric with a principled weighted formula for instruction-following systems. SKILLVETBENCH integrates full CVSS v4.0 vector decomposition and a ClawHub dual-view. Demonstrations show the LLM-as-Judge achieves zero false negatives across 78 confirmed-malicious skills and zero false positives across 22 benign controls, significantly outperforming static baselines like SKILLSIEVE, which misses 15%. Conventional tools miss 89% to 100% of instruction-layer threats like Prompt Injection and Memory Poisoning. Detection rates vary from 35% to 95% across four LLM evaluators, motivating ensemble scoring.

Key takeaway

For AI Security Engineers or teams deploying LLM agents, traditional code-layer security scanners are inadequate for vetting open-source skills, missing up to 100% of instruction-layer threats. You should integrate semantic, multi-dimensional vetting systems like SKILLVETBENCH's LLM-as-Judge approach. This shift ensures comprehensive detection of risks such as prompt injection and memory poisoning, significantly improving agent security. Consider implementing ensemble scoring for LLM evaluators to enhance reliability and coverage.

Key insights

LLM-as-Judge effectively vets open-source LLM agent skills for multi-dimensional security risks, surpassing traditional code-layer scanners.

Principles

Semantic vetting is crucial for instruction-layer and multi-agent risks.
LLM-as-Judge can achieve zero false negatives in security evaluation.
Ensemble scoring improves LLM evaluator detection rates.

Method

SKILLVETBENCH employs an LLM-as-Judge to evaluate agent skills, utilizing SARS, a five-dimensional agentic-risk metric, and integrating CVSS v4.0 vector decomposition for comprehensive risk assessment.

In practice

Adopt LLM-as-Judge for agent skill security vetting.
Implement multi-dimensional risk metrics like SARS.
Deploy ensemble scoring for LLM-based security evaluators.

Topics

LLM Agents
Security Risk Evaluation
Open-Source LLMs
LLM-as-Judge
Prompt Injection
Memory Poisoning
CVSS v4.0

Best for: CTO, VP of Engineering/Data, Director of AI/ML, AI Scientist, AI Security Engineer, Machine Learning Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.