PhyGenHOI: Physically-Aware 4D Generation of Dynamic Human-Object Interactions

2026-05-28 · Source: Computer Vision and Pattern Recognition · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Robotics & Autonomous Systems, Computer Vision · Depth: Expert, quick

Summary

PhyGenHOI is a novel framework designed to generate physically accurate and visually faithful 4D Human-Object Interaction (HOI) scenes. It takes a static 3D human and a target object, both represented as 3D Gaussian Splats (3DGS), and synthesizes dynamic interactions based on text input, such as "punching" or "kicking." The system integrates a generative human motion model, specifically a Motion Diffusion Model (MDM), with an explicit physical object simulation using the Material Point Method (MPM). Both human and object are unified through 3D Gaussians, enabling differentiable representation. PhyGenHOI supervises interactions via three mechanisms: a Windowed Attraction Loss for temporal synchronization, a Contact-Driven Re-simulation step for momentum transfer, and a Masked Video-SDS objective for enhanced contact fidelity. Experiments demonstrate its ability to produce physically consistent 4D HOI across diverse actions, humans, and objects, outperforming existing baselines.

Key takeaway

For computer vision engineers developing realistic human-object interaction simulations, PhyGenHOI offers a robust approach to overcome physical inconsistencies. You can now synthesize dynamic 4D scenes where humans interact with objects, like punching or kicking, with accurate momentum transfer and contact fidelity. This framework allows you to generate complex interactions from text prompts, significantly improving realism in virtual environments or character animation. Consider integrating its coupled generative motion and physical simulation for your next project.

Key insights

PhyGenHOI couples generative human motion with explicit physical object simulation for realistic 4D human-object interactions.

Principles

Unify human and object via 3D Gaussians.
Synchronize generative motion with physical simulation.
Re-simulate contact for momentum transfer.

Method

PhyGenHOI integrates a Motion Diffusion Model for human motion and Material Point Method for object simulation, using 3D Gaussians. It employs Windowed Attraction Loss, Contact-Driven Re-simulation, and Masked Video-SDS for interaction supervision.

In practice

Generate dynamic 4D HOI from text prompts.
Synthesize physically consistent human-object impacts.
Enhance contact fidelity using video priors.

Topics

4D Generation
Human-Object Interaction
Physical Simulation
Motion Diffusion Models
3D Gaussian Splats
Material Point Method

Best for: Research Scientist, AI Scientist, Computer Vision Engineer, Machine Learning Engineer

Related on AIssential

See Counsel's argued verdicts on the open AI decisions leaders are weighing →

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Computer Vision and Pattern Recognition.