UniDexTok: A Unified Dexterous Hand Tokenizer from Real Data

2026-06-12 · Source: cs.AI updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Robotics & Autonomous Systems · Depth: Expert, extended

Summary

UniDexTok, a unified dexterous hand tokenizer, and UDHM, a Unified Dexterous Hand Model, address the challenge of fragmented data across varying dexterous hand hardware. UDHM maps human and robot hand states into a shared 22-DoF semantic interface. UniDexTok learns embodiment-conditioned discrete tokens directly from standardized real joint states, providing a unified representation without relying on retargeting or simulation data. Compared to the UniHM baseline, UniDexTok reduces Mean Per-Joint Angle Error (MPJAE) from 15.63 degrees to 0.16 degrees (a 98.98% reduction) and Mean Per-Joint Position Error (MPJPE) from 18.51 mm to 0.18 mm (a 99.03% reduction), achieving sub-millimeter accuracy. The system also demonstrates strong zero-shot and few-shot reconstruction capabilities for new dexterous hands, such as the Inspire hand, reducing fingertip FK errors by 58.5–78.8% after few-shot adaptation.

Key takeaway

For Machine Learning Engineers developing dexterous manipulation systems, UniDexTok offers a superior approach to handling heterogeneous robot hand data. You should consider adopting its retarget-free pipeline and UDHM for standardizing diverse hand states. This enables more accurate cross-embodiment state representation, facilitating zero-shot transfer and efficient few-shot adaptation to new hardware, significantly reducing the domain gap and improving model performance.

Key insights

UniDexTok unifies heterogeneous dexterous hand data into a shared discrete token space, enabling accurate cross-embodiment state representation and transfer.

Principles

Standardized semantic interfaces improve data usability across diverse hardware.
Direct processing of real hand data is more accurate than retargeting or simulation.
Factorized codebooks enhance discrete representation quality for complex states.

Method

UniDexTok uses UDHM for 22-DoF state standardization, then a conditional transformer encoder, factorized vector quantization, and a conditional decoder to learn and reconstruct discrete tokens from real hand states.

In practice

Use UDHM to standardize diverse human and robot hand kinematics.
Incorporate human hand data as a valid training embodiment for broader pose coverage.
Employ factorized vector quantization for robust discrete state representation in tokenizers.

Topics

Dexterous Hands
State Tokenization
Cross-Embodiment Learning
Unified Hand Models
Robot Learning
Zero-Shot Adaptation

Best for: Research Scientist, AI Scientist, Machine Learning Engineer, Robotics Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.