[D] We scanned 18,000 exposed OpenClaw instances and found 15% of community skills contain malicious instructions

2026-02-12 · Source: Machine Learning · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Cybersecurity & Data Privacy, Robotics & Autonomous Systems · Depth: Advanced, medium

Summary

Security research on OpenClaw, a popular autonomous agent platform with 165k GitHub stars, revealed over 18,000 internet-exposed instances. Analysis of its community skill repository found that nearly 15% of skills contained malicious instructions, such as prompts designed for data exfiltration, payload downloads, or credential harvesting. These malicious skills often reappear quickly under new identities after removal. The primary concern is "Delegated Compromise," where agents inherit broad user permissions across digital services, making them a potent attack vector. The supply chain risk is significant due to 700+ community skills lacking systematic security review, with examples ranging from obvious clipboard exfiltration to subtle data leaks via "debug logs." OpenClaw acknowledges these security tradeoffs, but the broader community may not fully grasp the implications.

Key takeaway

For AI Architects and CTOs evaluating autonomous agent deployments, recognize that OpenClaw's model presents a "Delegated Compromise" threat where agents inherit extensive user permissions. You should prioritize robust skill vetting, implement strict sandboxing, and explore capability-based permission models to mitigate the inherent supply chain risks and prevent agent-to-agent propagation of prompt injection attacks.

Key insights

Autonomous agents like OpenClaw present a novel and multiplicative attack surface due to broad permission inheritance and natural language exploit vectors.

Principles

Agent security is multiplicative, not additive.
Natural language is an exploit vector.
Convenience often conflicts with control.

Method

Skill definitions are parsed for patterns like base64 payloads and obfuscated URLs. Behavioral testing involves running skills in isolated environments to monitor for unexpected network calls or file system access.

In practice

Implement capability-based permissions for skills.
Utilize behavioral anomaly detection for agent activity.
Sandbox skills with explicit data access bridges.

Topics

AI Agent Security
Prompt Injection Attacks
Supply Chain Risk
Autonomous Agents
Delegated Compromise

Best for: CTO, AI Architect, VP of Engineering/Data, AI Security Engineer, Security Engineer, MLOps Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Machine Learning.