Multi-HMR 2: Multi-Person Camera-Centric Human Detection, Mesh Recovery and Tracking

· Source: Computer Vision and Pattern Recognition · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Robotics & Autonomous Systems, Computer Vision · Depth: Expert, quick

Summary

Multi-HMR 2, published on 2026-06-12, is a robust DETR-based framework designed for multi-person camera-centric human detection, mesh recovery, and tracking. It addresses limitations of traditional pelvis-centered approaches by focusing on metric 3D localization and detection accuracy in the camera coordinate system. The system predicts a scene-consistent camera alongside human meshes, enabling metric 3D localization without requiring ground-truth intrinsics. Furthermore, by distilling image-based memory features from SAM2, Multi-HMR 2 extends to tracking, achieving consistent identity association without video supervision. Despite its conceptual simplicity, lacking handcrafted components, video input, or ground-truth cameras, it achieves state-of-the-art pelvis-centered performance while substantially improving detection accuracy and metric 3D localization.

Key takeaway

For Computer Vision Engineers developing multi-person 3D perception systems, Multi-HMR 2 offers a robust, simplified approach. You can achieve accurate metric 3D localization and consistent identity tracking without relying on complex video supervision or handcrafted components. Consider evaluating this framework to streamline deployment and improve performance in dynamic, real-world environments, especially for human-robot interaction.

Key insights

Multi-HMR 2 integrates camera-centric detection, mesh recovery, and tracking for robust 3D human perception.

Principles

Method

Multi-HMR 2 is a DETR-based framework that predicts scene-consistent cameras and human meshes, using SAM2-distilled memory features for identity tracking.

In practice

Topics

Best for: Research Scientist, AI Scientist, Computer Vision Engineer, Robotics Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Computer Vision and Pattern Recognition.