The Lie We Tell: Correcting the Euclidean Fallacy in Vision Language Action Policies via Score Matching on Tangent Space
Summary
Lie Diffuser Actor (LDA) is a novel diffusion framework designed to correct the "Euclidean Fallacy" prevalent in existing diffusion-based Vision-Language-Action (VLA) policies for robotic manipulation. Current VLA policies incorrectly represent SE(3) poses as flat R^12 vectors, leading to manifold drift, broken equivariance under coordinate transformations, and non-geodesic trajectories with excessive kinematic cost. LDA addresses this by operating intrinsically on SE(3), injecting noise through left-invariant Stochastic Differential Equations (SDEs), predicting scores in the tangent space, and retracting samples via the exponential map. This formulation inherently eliminates manifold drift, guarantees coordinate-frame equivariance, and ensures geodesic optimality. Experimental results on CALVIN ABC->D show LDA improved average task length from 3.27 to 3.51, a 7.3% increase, and demonstrated superior performance over baselines in real robot tasks.
Key takeaway
For Machine Learning Engineers developing diffusion-based Vision-Language-Action policies for robotic manipulation, you should recognize that representing SE(3) poses as flat R^12 vectors introduces critical geometric errors. Adopting the Lie Diffuser Actor (LDA) framework, which operates intrinsically on SE(3), can eliminate manifold drift and guarantee geodesic optimality. You can achieve improved task performance, as demonstrated by a 7.3% increase on CALVIN ABC->D, by implementing these geometrically sound principles.
Key insights
Lie Diffuser Actor corrects geometric errors in VLA policies by operating intrinsically on the SE(3) manifold, ensuring geodesic optimality.
Principles
- Representing SE(3) poses as R^12 vectors causes geometric errors.
- Intrinsic manifold operations prevent drift and ensure equivariance.
- Score matching on tangent space enables geodesic trajectories.
Method
LDA injects noise via left-invariant SDEs, predicts scores in the tangent space, and retracts samples using the exponential map, operating intrinsically on SE(3) for robotic manipulation.
In practice
- Apply Lie Diffuser Actor for robust robotic manipulation.
- Use intrinsic SE(3) operations to avoid manifold drift.
- Improve VLA policy task length and performance.
Topics
- Robotic Manipulation
- Vision-Language-Action
- Diffusion Models
- Lie Groups
- SE(3) Geometry
- Score Matching
Best for: Research Scientist, AI Scientist, Robotics Engineer, Machine Learning Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Takara TLDR - Daily AI Papers.