ARB4WM: An Adversarial Robustness Benchmark for World Models in Continuous Control

2026-06-15 · Source: Artificial Intelligence · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Robotics & Autonomous Systems, Cybersecurity & Data Privacy · Depth: Expert, quick

Summary

ARB4WM is a new, unified evaluation framework designed for pre-deployment robustness and risk assessment of world-model agents operating under visual perturbations. It addresses a critical gap in existing evaluations, which lack a comprehensive benchmark for adversarial threats across the policy, value, and latent-dynamics levels of world-model agents. The framework defines five white-box loss objectives across these three levels, combining them with single-step or multi-step perturbation strategies and temporal attack modes, including full-frame, half-sequence, and sparse-frame exposure. Evaluations on four Dreamer-style agents across 20 tasks from MetaWorld and the DeepMind Control Suite revealed that attacks targeting value estimation, latent representations, and RSSM dynamics are as damaging as direct policy disruption. Furthermore, early or frequent perturbations proved especially harmful, while input-level defenses offered limited recovery against adaptive attacks.

Key takeaway

For AI Security Engineers developing world-model agents for safety-critical systems, you must expand robustness assessments beyond action-space attacks. Integrate multi-component attack objectives targeting value estimation, latent representations, and RSSM dynamics, as well as temporal exposure protocols. Relying solely on input-level defenses is insufficient; prioritize comprehensive pre-deployment risk assessment to identify vulnerabilities effectively and ensure system reliability.

Key insights

Adversarial robustness for world models requires multi-level, temporal attack assessment beyond just action-space disruption.

Principles

World model robustness needs multi-component attack objectives.
Temporal attack exposure significantly impacts vulnerability.
Input-level defenses offer limited protection against adaptive attacks.

Method

ARB4WM defines five white-box loss objectives across policy, value, and latent-dynamics levels, combining them with single-step/multi-step perturbation strategies and temporal attack modes.

In practice

Evaluate world models using ARB4WM's multi-level attack objectives.
Test temporal attack modes like half-sequence or sparse-frame exposure.
Prioritize defenses beyond simple input-level perturbations.

Topics

World Models
Adversarial Robustness
Continuous Control
Latent Dynamics
Risk Assessment
DeepMind Control Suite

Code references

zaoanguai/ARB4WM

Best for: Research Scientist, CTO, VP of Engineering/Data, AI Scientist, AI Security Engineer, Machine Learning Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.