HumanSplatHMR: Closing the Loop Between Human Mesh Recovery and Gaussian Splatting Avatar
Summary
HumanSplatHMR is a novel joint optimization framework designed to improve 3D human pose recovery and high-fidelity avatar synthesis from video. Existing methods often fail to accurately capture 3D human geometry, with ViT-based approaches overfitting to 2D views and NeRF/Gaussian Splatting avatars separating pose and appearance, which limits generalization. HumanSplatHMR addresses these issues by closing the loop between geometric pose estimation and differentiable rendering. It refines 3D human poses and learns an avatar simultaneously, enabling novel-view and novel-pose synthesis. Unlike prior methods that depend on motion capture or offline pose refinement, HumanSplatHMR uses only human mesh estimates from a state-of-the-art pose estimator, making it suitable for in-the-wild scenarios. It backpropagates photometric, segmentation, and depth losses through a differentiable renderer to refine global 3D pose over time, enhancing accuracy and alignment while generating superior novel-view renderings.
Key takeaway
For research scientists developing human avatar or motion capture systems, HumanSplatHMR demonstrates that integrating pose estimation with differentiable rendering significantly improves 3D human geometry recovery and avatar generalization. You should consider adopting a closed-loop optimization approach to refine pose parameters directly from image-level losses, moving beyond reliance on pre-computed or motion-captured poses for more robust in-the-wild performance.
Key insights
Jointly optimizing human pose and avatar rendering improves 3D geometry and generalization for in-the-wild video.
Principles
- Coupling pose estimation and rendering enhances accuracy.
- Differentiable rendering enables pose refinement from image losses.
Method
HumanSplatHMR backpropagates photometric, segmentation, and depth losses through a differentiable renderer to refine global 3D pose parameters and position over time.
In practice
- Use human mesh estimates for real-world pose input.
- Apply joint optimization for novel-view synthesis.
Topics
- HumanSplatHMR
- Human Mesh Recovery
- Gaussian Splatting
- Differentiable Rendering
- Novel View Synthesis
Best for: Research Scientist, AI Scientist, Computer Vision Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Computer Vision and Pattern Recognition.