TempoVLA: Learning Speed-Controllable Vision-Language-Action Policies

2026-06-04 · Source: Artificial Intelligence · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Robotics & Autonomous Systems · Depth: Expert, quick

Summary

TempoVLA is a novel Vision-Language-Action (VLA) model designed to enable speed-controllable robot manipulation, addressing the limitation of existing VLAs that operate at a single fixed speed. Robot tasks often require alternating between fast transit phases and slow, precise contact stages. TempoVLA achieves flexible speed control by observing that predicted action magnitude governs robot movement speed. It integrates a Variable-Speed Trajectory Augmentation (VSTA) component, which re-times demonstration data to target speeds while preserving motion semantics, and a model-side conditioning mechanism that explicitly feeds the desired speed to the policy. Experiments confirm TempoVLA's bidirectional speed control and show VSTA boosts default 1× performance through better data utilization. When combined with a large multimodal model, TempoVLA facilitates dynamic speed adjustments, accelerating through low-risk phases and decelerating for high-risk ones.

Key takeaway

For robotics engineers developing manipulation systems, TempoVLA offers a critical advancement for achieving more adaptable and efficient robot behaviors. If your applications demand varying execution speeds—fast for transit, slow for precision—you should explore integrating explicit speed conditioning and Variable-Speed Trajectory Augmentation (VSTA) into your VLA training pipelines. This approach can significantly improve task performance and safety by enabling dynamic speed adjustments based on task context, moving beyond fixed-speed limitations.

Key insights

Robot execution speed can be dynamically controlled in VLAs by conditioning on desired speed and augmenting training data.

Principles

Robot tasks benefit from variable execution speed
Action magnitude correlates with robot movement speed
Data augmentation can enhance VLA performance

Method

TempoVLA uses Variable-Speed Trajectory Augmentation (VSTA) to re-time demonstrations by merging or splitting actions, combined with a model-side mechanism to condition the policy on the target speed.

In practice

Integrate explicit speed conditioning into VLA architectures
Apply VSTA to augment VLA training datasets
Cooperate with LMMs for dynamic speed control

Topics

Robot Manipulation
Vision-Language-Action Models
Speed Control
Trajectory Augmentation
Multimodal AI
Robotics

Best for: Research Scientist, AI Scientist, Robotics Engineer

Related on AIssential

See Counsel's argued verdicts on the open AI decisions leaders are weighing →

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.