Hallucination as Exploit: Evidence-Carrying Multimodal Agents
Summary
Evidence-Carrying Multimodal Agents (ECA) are proposed to address "hallucination-to-action conversion" (H2AC), a security failure where false perceptual claims in multimodal agents trigger privileged actions. ECA formalizes H2AC by treating free-form model text as inadmissible evidence for tool authorization. The architecture decomposes tool calls into action-critical predicates, obtains typed certificates from constrained verifiers (DOM, OCR, AX), and uses a deterministic gate to grant privileges only when supported by these certificates. This approach converts opaque MLLM risk into auditable verifier risk. Verifier red-teaming with 1,900 attacks demonstrated a reduction in gate bypass from 15% to 1.3% after four targeted hardening steps. ECA achieved a 0% unsafe-action rate (UAR) on a 200-task end-to-end pipeline (Wilson 95% upper bound 2.67%) and a 120-task browser proof-of-concept (upper bound 4.3%). A direct HACR audit on 500 tasks showed ECA blocked all unsupported action-critical claims, unlike naive agents (100.0%) or prompt-only defenses (49.6%).
Key takeaway
For AI Engineers deploying multimodal agents with tool-use capabilities, you must implement robust authorization boundaries. Relying on model-generated text for action preconditions is a critical security vulnerability. Instead, integrate an evidence-carrying architecture like ECA to ensure all action-critical predicates are externally certified by trusted verifiers before execution. This shifts risk from opaque model beliefs to auditable verifier failures, significantly reducing unsafe actions.
Key insights
Hallucination-to-action conversion in multimodal agents is mitigated by requiring external, auditable evidence certificates for action-critical predicates.
Principles
- Separate interpretation from authority.
- Treat free-form model text as inadmissible evidence.
- Convert opaque MLLM risk to auditable verifier risk.
Method
ECA decomposes tool calls into action-critical predicates, obtains typed certificates from constrained DOM/OCR/AX verifiers, and uses a deterministic gate to authorize actions only with certified predicates.
In practice
- Implement a policy gate for tool calls.
- Use DOM, OCR, and AX verifiers for evidence.
- Harden verifiers against adversarial inputs.
Topics
- Multimodal Agents
- Hallucination-to-Action Conversion (H2AC)
- Evidence-Carrying Agents
- Tool Authorization
- Verifier Red-Teaming
- Agent Security
Best for: AI Architect, CTO, VP of Engineering/Data, AI Scientist, AI Engineer, AI Security Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.