Contrastive Multi-Modal Hypergraph Reasoning for 3D Crowd Mesh Recovery

2026-05-15 · Source: cs.CV updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Robotics & Autonomous Systems · Depth: Expert, extended

Summary

Researchers from Tianjin University, Nanyang Technological University, and Sichuan University introduce Contrastive Multi-modal Hypergraph Reasoning (CoMHR), a novel framework for multi-person 3D mesh reconstruction in crowded scenes. CoMHR addresses challenges like severe occlusions and depth ambiguity by synergizing RGB features, geometric priors from pseudo-depth maps, and occlusion-aware 3D poses. The method initializes robust node representations, introduces a pelvis depth indicator for global spatial anchoring, and constructs a shared-topology hypergraph to model higher-order crowd dynamics. A hypergraph-based contrastive learning scheme enhances intra-modal discriminability and enforces cross-modal orthogonality, enabling effective global context propagation. Extensive experiments on Panoptic and GigaCrowd benchmarks demonstrate that CoMHR achieves new state-of-the-art performance, outperforming previous methods like GroupRec by improving pose consistency (OKS) by 7.3% and eliminating reconstruction conflicts (RP = 0.00).

Key takeaway

For research scientists developing 3D human mesh recovery systems, CoMHR demonstrates that integrating multi-modal data (RGB, depth, pose) with hypergraph reasoning and contrastive learning significantly improves accuracy and robustness in dense, occluded crowd scenes. You should consider adopting similar multi-modal fusion and high-order relational modeling techniques to overcome depth ambiguity and occlusion challenges, especially when working with large-scale crowd datasets.

Key insights

CoMHR fuses multi-modal cues with hypergraph reasoning and contrastive learning for robust 3D crowd mesh recovery.

Principles

Multi-modal fusion enhances robustness in crowded scenes.
Hypergraphs model high-order crowd dynamics effectively.
Contrastive learning improves feature discriminability and orthogonality.

Method

CoMHR initializes multi-modal node features, constructs a shared-topology hypergraph, and applies dual-branch contrastive learning to refine features before high-order reasoning and SMPL parameter regression.

In practice

Combine RGB, depth, and pose features for complex scene analysis.
Use pelvis depth as a global spatial anchor for depth ordering.
Employ contrastive learning to align and disentangle multi-modal features.

Topics

3D Crowd Mesh Recovery
Multi-modal Fusion
Hypergraph Reasoning
Contrastive Learning
Pelvis Depth Indicator

Code references

SunMH-try/CoMHR

Best for: Research Scientist, AI Scientist, Machine Learning Engineer, Computer Vision Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CV updates on arXiv.org.