ARB4WM: An Adversarial Robustness Benchmark for World Models in Continuous Control

· Source: Artificial Intelligence · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Robotics & Autonomous Systems, Cybersecurity & Data Privacy · Depth: Expert, quick

Summary

ARB4WM is a new, unified evaluation framework designed for pre-deployment robustness and risk assessment of world-model agents operating under visual perturbations. It addresses a critical gap in existing evaluations, which lack a comprehensive benchmark for adversarial threats across the policy, value, and latent-dynamics levels of world-model agents. The framework defines five white-box loss objectives across these three levels, combining them with single-step or multi-step perturbation strategies and temporal attack modes, including full-frame, half-sequence, and sparse-frame exposure. Evaluations on four Dreamer-style agents across 20 tasks from MetaWorld and the DeepMind Control Suite revealed that attacks targeting value estimation, latent representations, and RSSM dynamics are as damaging as direct policy disruption. Furthermore, early or frequent perturbations proved especially harmful, while input-level defenses offered limited recovery against adaptive attacks.

Key takeaway

For AI Security Engineers developing world-model agents for safety-critical systems, you must expand robustness assessments beyond action-space attacks. Integrate multi-component attack objectives targeting value estimation, latent representations, and RSSM dynamics, as well as temporal exposure protocols. Relying solely on input-level defenses is insufficient; prioritize comprehensive pre-deployment risk assessment to identify vulnerabilities effectively and ensure system reliability.

Key insights

Adversarial robustness for world models requires multi-level, temporal attack assessment beyond just action-space disruption.

Principles

Method

ARB4WM defines five white-box loss objectives across policy, value, and latent-dynamics levels, combining them with single-step/multi-step perturbation strategies and temporal attack modes.

In practice

Topics

Code references

Best for: Research Scientist, CTO, VP of Engineering/Data, AI Scientist, AI Security Engineer, Machine Learning Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.