PhyGenHOI: Physically-Aware 4D Generation of Dynamic Human-Object Interactions

· Source: Computer Vision and Pattern Recognition · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Robotics & Autonomous Systems, Computer Vision · Depth: Expert, quick

Summary

PhyGenHOI is a novel framework designed to generate physically accurate and visually faithful 4D Human-Object Interaction (HOI) scenes. It takes a static 3D human and a target object, both represented as 3D Gaussian Splats (3DGS), and synthesizes dynamic interactions based on text input, such as "punching" or "kicking." The system integrates a generative human motion model, specifically a Motion Diffusion Model (MDM), with an explicit physical object simulation using the Material Point Method (MPM). Both human and object are unified through 3D Gaussians, enabling differentiable representation. PhyGenHOI supervises interactions via three mechanisms: a Windowed Attraction Loss for temporal synchronization, a Contact-Driven Re-simulation step for momentum transfer, and a Masked Video-SDS objective for enhanced contact fidelity. Experiments demonstrate its ability to produce physically consistent 4D HOI across diverse actions, humans, and objects, outperforming existing baselines.

Key takeaway

For computer vision engineers developing realistic human-object interaction simulations, PhyGenHOI offers a robust approach to overcome physical inconsistencies. You can now synthesize dynamic 4D scenes where humans interact with objects, like punching or kicking, with accurate momentum transfer and contact fidelity. This framework allows you to generate complex interactions from text prompts, significantly improving realism in virtual environments or character animation. Consider integrating its coupled generative motion and physical simulation for your next project.

Key insights

PhyGenHOI couples generative human motion with explicit physical object simulation for realistic 4D human-object interactions.

Principles

Method

PhyGenHOI integrates a Motion Diffusion Model for human motion and Material Point Method for object simulation, using 3D Gaussians. It employs Windowed Attraction Loss, Contact-Driven Re-simulation, and Masked Video-SDS for interaction supervision.

In practice

Topics

Best for: Research Scientist, AI Scientist, Computer Vision Engineer, Machine Learning Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Computer Vision and Pattern Recognition.