Trajectory-Level Redirection Attacks on Vision-Language-Action Models

2026-06-11 · Source: Computer Vision and Pattern Recognition · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Robotics & Autonomous Systems, Cybersecurity & Data Privacy · Depth: Expert, quick

Summary

Trajectory-Level Redirection Attacks target Vision-Language-Action (VLA) policies, which integrate natural language into closed-loop robot control. This new threat model, termed "command-preserving trajectory redirection," focuses on prompts that appear benign but subtly alter a robot's final physical outcome over an entire task trajectory. Unlike prior attacks targeting low-level actions, this vulnerability exploits the recurring role of text prompts in VLA replanning. Researchers formalized this as a prompt-only threat where the attacker selects a single, near-benign prompt before an episode, and policy/environment components remain fixed. An "on-policy prompt search method" using rollouts was developed to discover such adversarial prompts. Experiments in both simulation and hardware confirmed that these subtle prompt perturbations successfully redirect VLA rollouts to attacker-specified targets, highlighting a significant instruction grounding vulnerability.

Key takeaway

For AI Security Engineers evaluating robotic systems, this research indicates a critical vulnerability in VLA models. You should prioritize testing VLA policies for "command-preserving trajectory redirection" attacks, as seemingly innocuous prompt changes can hijack robot task outcomes. Implement robust prompt validation and anomaly detection mechanisms to mitigate risks from subtle adversarial inputs that exploit the recurring role of language in robot control.

Key insights

VLA models are vulnerable to subtle, command-preserving prompt attacks that redirect robot task outcomes.

Principles

Text prompts have a recurring role in VLA control.
Adversarial prompts can redirect entire task trajectories.
Near-benign prompts can control robot physical outcomes.

Method

An on-policy prompt search method uses rollouts to discover perturbations. These perturbations track a target task while satisfying command-preserving constraints.

In practice

Test VLA policies against trajectory-level redirection.
Implement prompt validation for VLA systems.

Topics

Vision-Language-Action Models
Robot Control
Adversarial Attacks
Prompt Engineering
Trajectory Redirection
AI Security

Best for: Research Scientist, AI Scientist, Robotics Engineer, AI Security Engineer

Related on AIssential

See Counsel's argued verdicts on the open AI decisions leaders are weighing →

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Computer Vision and Pattern Recognition.