Test-Time Perturbation Learning with Delayed Feedback for Vision-Language-Action Models

2026-04-20 · Source: Takara TLDR - Daily AI Papers · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Robotics & Autonomous Systems · Depth: Expert, medium

Summary

Vision-Language-Action (VLA) models, while effective in sequential decision-making, exhibit fragility to minor environmental changes due to trajectory overfitting. Researchers propose Perturbation learning with Delayed Feedback (PDF), a verifier-free test-time adaptation framework that enhances decision performance without requiring base model fine-tuning. PDF addresses spurious correlations through uncertainty-based data augmentation and action voting, utilizing an adaptive scheduler to balance performance and efficiency. To bolster stability, PDF incorporates a lightweight perturbation module that retrospectively adjusts action logits using delayed feedback, thereby correcting overconfidence. Experiments demonstrate PDF's consistent gains, achieving a +7.4% success rate on LIBERO and a +10.3 human normalized score on Atari, establishing a practical approach for reliable test-time adaptation in multimodal decision-making agents. The code is available on GitHub.

Key takeaway

For research scientists developing Vision-Language-Action models, you should consider integrating Perturbation learning with Delayed Feedback (PDF) to enhance model robustness and performance at test time. This framework offers a verifier-free approach to mitigate trajectory overfitting and overconfidence, leading to significant gains in task success rates without fine-tuning the base model. Implementing PDF can lead to more reliable multimodal decision-making agents.

Key insights

PDF improves VLA model robustness at test-time by mitigating overfitting and overconfidence through adaptive perturbation and delayed feedback.

Principles

Mitigate spurious correlations via uncertainty-based augmentation.
Retrospectively adjust action logits with delayed feedback.
Balance performance and efficiency with adaptive scheduling.

Method

PDF uses uncertainty-based data augmentation and action voting, managed by an adaptive scheduler. It learns a lightweight perturbation module to adjust action logits retrospectively, guided by delayed feedback to correct overconfidence.

In practice

Apply PDF to enhance VLA robustness in dynamic environments.
Utilize PDF for improved success rates in robotic manipulation.
Implement PDF for better performance in Atari game agents.

Topics

Perturbation Learning
Delayed Feedback
Vision-Language-Action Models
Test-Time Adaptation
Uncertainty-based Data Augmentation

Code references

zhoujiahuan1991/CVPR2026-PDF

Best for: Research Scientist, AI Scientist, Machine Learning Engineer, Robotics Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Takara TLDR - Daily AI Papers.