Human Universal Grasping
Summary
Human Universal Grasping (HUG) introduces a flow-matching model designed to generate diverse human grasps for any user-specified object from a single RGB-D image. This model addresses the challenge of multi-fingered robots lacking human-level grasping generality. The authors collected 1M-HUGs, an egocentric dataset comprising 1 million frames (27.8 hours) and 6,707 object instances across 41 buildings. HUG fuses RGB and depth observations to output a grasp parameterized by wrist translation, rotation, and MANO hand pose. Predicted grasps can be retargeted for zero-shot robot grasping. Evaluated on HUG-Bench, a new simulated benchmark of 90 unseen objects, HUG outperforms state-of-the-art baselines by +23% and +34% on a challenging 30-object real-world test set.
Key takeaway
For robotics engineers developing multi-fingered grasping systems, HUG offers a robust, data-driven solution to overcome limitations in generality. Your teams can utilize the HUG model and its 1M-HUGs dataset to train robots for zero-shot grasping in complex, everyday environments. Consider integrating this approach to significantly improve robot manipulation capabilities, as it outperforms existing baselines by up to +34%.
Key insights
Human egocentric grasping data and flow-matching models enable robots to achieve universal, zero-shot object manipulation.
Principles
- Human egocentric data is a natural source for robot grasping.
- Fusing RGB-D observations improves grasp generation.
- Retargeting human grasps enables zero-shot robot manipulation.
Method
Collect 1M-HUGs egocentric human grasp data using smart glasses. Train a flow-matching model that fuses RGB and depth to output wrist translation, rotation, and MANO hand pose for grasp generation.
In practice
- Generate diverse human grasps for any object from RGB-D.
- Retarget predicted grasps to various robot hands.
- Evaluate robot grasping using the HUG-Bench benchmark.
Topics
- Robot Grasping
- Human Universal Grasping
- Flow-Matching Models
- Egocentric Data
- RGB-D Perception
- Zero-Shot Learning
Best for: Computer Vision Engineer, Research Scientist, Robotics Engineer, AI Scientist, Machine Learning Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Machine Learning.