Multi-HMR 2: Multi-Person Camera-Centric Human Detection, Mesh Recovery and Tracking
Summary
Multi-HMR 2, published on 2026-06-12, is a robust DETR-based framework designed for multi-person camera-centric human detection, mesh recovery, and tracking. It addresses limitations of traditional pelvis-centered approaches by focusing on metric 3D localization and detection accuracy in the camera coordinate system. The system predicts a scene-consistent camera alongside human meshes, enabling metric 3D localization without requiring ground-truth intrinsics. Furthermore, by distilling image-based memory features from SAM2, Multi-HMR 2 extends to tracking, achieving consistent identity association without video supervision. Despite its conceptual simplicity, lacking handcrafted components, video input, or ground-truth cameras, it achieves state-of-the-art pelvis-centered performance while substantially improving detection accuracy and metric 3D localization.
Key takeaway
For Computer Vision Engineers developing multi-person 3D perception systems, Multi-HMR 2 offers a robust, simplified approach. You can achieve accurate metric 3D localization and consistent identity tracking without relying on complex video supervision or handcrafted components. Consider evaluating this framework to streamline deployment and improve performance in dynamic, real-world environments, especially for human-robot interaction.
Key insights
Multi-HMR 2 integrates camera-centric detection, mesh recovery, and tracking for robust 3D human perception.
Principles
- Camera-centric recovery is key for real-world HMR applications.
- Image-based memory features enable consistent identity tracking without video supervision.
Method
Multi-HMR 2 is a DETR-based framework that predicts scene-consistent cameras and human meshes, using SAM2-distilled memory features for identity tracking.
In practice
- Develop human-robot interaction systems.
- Enhance social scene understanding applications.
Topics
- Human Mesh Recovery
- Multi-person Tracking
- 3D Human Localization
- Camera-centric Perception
- DETR
- SAM2
Best for: Research Scientist, AI Scientist, Computer Vision Engineer, Robotics Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Computer Vision and Pattern Recognition.