The Lie We Tell: Correcting the Euclidean Fallacy in Vision Language Action Policies via Score Matching on Tangent Space

2026-06-01 · Source: Takara TLDR - Daily AI Papers · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Robotics & Autonomous Systems, Mathematics & Computational Sciences · Depth: Expert, medium

Summary

Lie Diffuser Actor (LDA) is a novel diffusion framework designed to correct the "Euclidean Fallacy" prevalent in existing diffusion-based Vision-Language-Action (VLA) policies for robotic manipulation. Current VLA policies incorrectly represent SE(3) poses as flat R^12 vectors, leading to manifold drift, broken equivariance under coordinate transformations, and non-geodesic trajectories with excessive kinematic cost. LDA addresses this by operating intrinsically on SE(3), injecting noise through left-invariant Stochastic Differential Equations (SDEs), predicting scores in the tangent space, and retracting samples via the exponential map. This formulation inherently eliminates manifold drift, guarantees coordinate-frame equivariance, and ensures geodesic optimality. Experimental results on CALVIN ABC->D show LDA improved average task length from 3.27 to 3.51, a 7.3% increase, and demonstrated superior performance over baselines in real robot tasks.

Key takeaway

For Machine Learning Engineers developing diffusion-based Vision-Language-Action policies for robotic manipulation, you should recognize that representing SE(3) poses as flat R^12 vectors introduces critical geometric errors. Adopting the Lie Diffuser Actor (LDA) framework, which operates intrinsically on SE(3), can eliminate manifold drift and guarantee geodesic optimality. You can achieve improved task performance, as demonstrated by a 7.3% increase on CALVIN ABC->D, by implementing these geometrically sound principles.

Key insights

Lie Diffuser Actor corrects geometric errors in VLA policies by operating intrinsically on the SE(3) manifold, ensuring geodesic optimality.

Principles

Representing SE(3) poses as R^12 vectors causes geometric errors.
Intrinsic manifold operations prevent drift and ensure equivariance.
Score matching on tangent space enables geodesic trajectories.

Method

LDA injects noise via left-invariant SDEs, predicts scores in the tangent space, and retracts samples using the exponential map, operating intrinsically on SE(3) for robotic manipulation.

In practice

Apply Lie Diffuser Actor for robust robotic manipulation.
Use intrinsic SE(3) operations to avoid manifold drift.
Improve VLA policy task length and performance.

Topics

Robotic Manipulation
Vision-Language-Action
Diffusion Models
Lie Groups
SE(3) Geometry
Score Matching

Best for: Research Scientist, AI Scientist, Robotics Engineer, Machine Learning Engineer

Related on AIssential

See Counsel's argued verdicts on the open AI decisions leaders are weighing →

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Takara TLDR - Daily AI Papers.