The Lie We Tell: Correcting the Euclidean Fallacy in Vision Language Action Policies via Score Matching on Tangent Space

· Source: Takara TLDR - Daily AI Papers · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Robotics & Autonomous Systems, Mathematics & Computational Sciences · Depth: Expert, medium

Summary

Lie Diffuser Actor (LDA) is a novel diffusion framework designed to correct the "Euclidean Fallacy" prevalent in existing diffusion-based Vision-Language-Action (VLA) policies for robotic manipulation. Current VLA policies incorrectly represent SE(3) poses as flat R^12 vectors, leading to manifold drift, broken equivariance under coordinate transformations, and non-geodesic trajectories with excessive kinematic cost. LDA addresses this by operating intrinsically on SE(3), injecting noise through left-invariant Stochastic Differential Equations (SDEs), predicting scores in the tangent space, and retracting samples via the exponential map. This formulation inherently eliminates manifold drift, guarantees coordinate-frame equivariance, and ensures geodesic optimality. Experimental results on CALVIN ABC->D show LDA improved average task length from 3.27 to 3.51, a 7.3% increase, and demonstrated superior performance over baselines in real robot tasks.

Key takeaway

For Machine Learning Engineers developing diffusion-based Vision-Language-Action policies for robotic manipulation, you should recognize that representing SE(3) poses as flat R^12 vectors introduces critical geometric errors. Adopting the Lie Diffuser Actor (LDA) framework, which operates intrinsically on SE(3), can eliminate manifold drift and guarantee geodesic optimality. You can achieve improved task performance, as demonstrated by a 7.3% increase on CALVIN ABC->D, by implementing these geometrically sound principles.

Key insights

Lie Diffuser Actor corrects geometric errors in VLA policies by operating intrinsically on the SE(3) manifold, ensuring geodesic optimality.

Principles

Method

LDA injects noise via left-invariant SDEs, predicts scores in the tangent space, and retracts samples using the exponential map, operating intrinsically on SE(3) for robotic manipulation.

In practice

Topics

Best for: Research Scientist, AI Scientist, Robotics Engineer, Machine Learning Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Takara TLDR - Daily AI Papers.