ClawHub Security Signals: When VirusTotal, Static Analysis, and SkillSpector Disagree
Summary
ClawHub Security Signals is a newly released, sanitized dataset comprising 67,453 public OpenClaw skill versions, designed to study security boundaries for AI agent skills. The dataset pairs redacted SKILL.md content and bundled files with a ClawScan registry verdict and evidence from three scanner families: VirusTotal, static heuristic analysis, and NVIDIA SkillSpector. Analysis reveals significant disagreement among these scanners regarding skill security. Any pair of scanners overlaps on at most 10.4% of their combined positives, only 0.69% of skills are flagged by all three, and 81.9% of flagged skills are identified by a single scanner. This disagreement is structured by attack surface; SkillSpector flags 75.3% of 25,504 suspicious rows but only 6.8% of 206 malicious ones, while VirusTotal identifies 72.8% of 206 malicious rows, consistent with bundled-code malware. The corpus is a silver-standard dataset, intended to support further research into layered agent-skill security.
Key takeaway
For AI Security Engineers evaluating agent skill trustworthiness, relying on a single scanner like VirusTotal or SkillSpector is insufficient. You must implement a layered security approach, recognizing that different scanners detect distinct attack surfaces. Your security strategy should integrate multiple detection methods to cover both traditional malware in bundled code and semantic agentic risks, rather than making allow/block decisions based on isolated signals.
Key insights
AI agent skill security requires layered governance due to significant scanner disagreement across attack surfaces.
Principles
- Single security scanners are insufficient for agent skills.
- Scanner disagreement is structured by attack surface.
- Semantic risk differs from traditional malware detection.
Method
ClawHub Security Signals dataset was created by pairing redacted SKILL.md content and bundled files with ClawScan verdicts and evidence from VirusTotal, static analysis, and NVIDIA SkillSpector.
In practice
- Use ClawHub dataset for skill-security triage model development.
- Implement layered security for AI agent skills.
- Differentiate semantic agentic risk from malware detection.
Topics
- AI Agent Skills
- ClawHub Dataset
- Security Scanners
- VirusTotal
- SkillSpector
- Static Code Analysis
Best for: Research Scientist, CTO, VP of Engineering/Data, AI Security Engineer, AI Scientist, MLOps Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.