ClawHub Security Signals: When VirusTotal, Static Analysis, and SkillSpector Disagree

· Source: Artificial Intelligence · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Cybersecurity & Data Privacy, Data Science & Analytics · Depth: Advanced, quick

Summary

ClawHub Security Signals is a newly released, sanitized dataset comprising 67,453 public OpenClaw skill versions, designed to study security boundaries for AI agent skills. The dataset pairs redacted SKILL.md content and bundled files with a ClawScan registry verdict and evidence from three scanner families: VirusTotal, static heuristic analysis, and NVIDIA SkillSpector. Analysis reveals significant disagreement among these scanners regarding skill security. Any pair of scanners overlaps on at most 10.4% of their combined positives, only 0.69% of skills are flagged by all three, and 81.9% of flagged skills are identified by a single scanner. This disagreement is structured by attack surface; SkillSpector flags 75.3% of 25,504 suspicious rows but only 6.8% of 206 malicious ones, while VirusTotal identifies 72.8% of 206 malicious rows, consistent with bundled-code malware. The corpus is a silver-standard dataset, intended to support further research into layered agent-skill security.

Key takeaway

For AI Security Engineers evaluating agent skill trustworthiness, relying on a single scanner like VirusTotal or SkillSpector is insufficient. You must implement a layered security approach, recognizing that different scanners detect distinct attack surfaces. Your security strategy should integrate multiple detection methods to cover both traditional malware in bundled code and semantic agentic risks, rather than making allow/block decisions based on isolated signals.

Key insights

AI agent skill security requires layered governance due to significant scanner disagreement across attack surfaces.

Principles

Method

ClawHub Security Signals dataset was created by pairing redacted SKILL.md content and bundled files with ClawScan verdicts and evidence from VirusTotal, static analysis, and NVIDIA SkillSpector.

In practice

Topics

Best for: Research Scientist, CTO, VP of Engineering/Data, AI Security Engineer, AI Scientist, MLOps Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.