Hallucination as Exploit: Evidence-Carrying Multimodal Agents

2026-05-21 · Source: cs.AI updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Cybersecurity & Data Privacy, Robotics & Autonomous Systems · Depth: Expert, extended

Summary

Evidence-Carrying Multimodal Agents (ECA) are proposed to address "hallucination-to-action conversion" (H2AC), a security failure where false perceptual claims in multimodal agents trigger privileged actions. ECA formalizes H2AC by treating free-form model text as inadmissible evidence for tool authorization. The architecture decomposes tool calls into action-critical predicates, obtains typed certificates from constrained verifiers (DOM, OCR, AX), and uses a deterministic gate to grant privileges only when supported by these certificates. This approach converts opaque MLLM risk into auditable verifier risk. Verifier red-teaming with 1,900 attacks demonstrated a reduction in gate bypass from 15% to 1.3% after four targeted hardening steps. ECA achieved a 0% unsafe-action rate (UAR) on a 200-task end-to-end pipeline (Wilson 95% upper bound 2.67%) and a 120-task browser proof-of-concept (upper bound 4.3%). A direct HACR audit on 500 tasks showed ECA blocked all unsupported action-critical claims, unlike naive agents (100.0%) or prompt-only defenses (49.6%).

Key takeaway

For AI Engineers deploying multimodal agents with tool-use capabilities, you must implement robust authorization boundaries. Relying on model-generated text for action preconditions is a critical security vulnerability. Instead, integrate an evidence-carrying architecture like ECA to ensure all action-critical predicates are externally certified by trusted verifiers before execution. This shifts risk from opaque model beliefs to auditable verifier failures, significantly reducing unsafe actions.

Key insights

Hallucination-to-action conversion in multimodal agents is mitigated by requiring external, auditable evidence certificates for action-critical predicates.

Principles

Separate interpretation from authority.
Treat free-form model text as inadmissible evidence.
Convert opaque MLLM risk to auditable verifier risk.

Method

ECA decomposes tool calls into action-critical predicates, obtains typed certificates from constrained DOM/OCR/AX verifiers, and uses a deterministic gate to authorize actions only with certified predicates.

In practice

Implement a policy gate for tool calls.
Use DOM, OCR, and AX verifiers for evidence.
Harden verifiers against adversarial inputs.

Topics

Multimodal Agents
Hallucination-to-Action Conversion (H2AC)
Evidence-Carrying Agents
Tool Authorization
Verifier Red-Teaming
Agent Security

Best for: AI Architect, CTO, VP of Engineering/Data, AI Scientist, AI Engineer, AI Security Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.