PhyMotion: Structured 3D Motion Reward for Physics-Grounded Human Video Generation

2026-05-15 · Source: cs.CV updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Robotics & Autonomous Systems · Depth: Expert, extended

Summary

PhyMotion introduces a structured, fine-grained motion reward designed to improve the realism of human motion in generated videos. This system addresses the limitations of existing 2D perceptual rewards by grounding recovered 3D human trajectories in a physics simulator, specifically MuJoCo. It evaluates motion quality across three distinct dimensions of physical feasibility: kinematic plausibility, contact and balance consistency, and dynamic feasibility. By recovering SMPL body meshes from generated videos and retargeting them onto a humanoid in the simulator, PhyMotion provides continuous and interpretable signals for specific aspects of motion quality. Experiments demonstrate that PhyMotion achieves an 80% average pairwise agreement with human judgments and a Spearman correlation of ρ=0.376, outperforming existing rewards. When used for RL-based post-training, it consistently improves motion realism across autoregressive and bidirectional video generators, yielding a +68 Elo gain in blind human evaluation and an average 7.1% improvement on external evaluators like VBench metrics.

Key takeaway

For research scientists developing human video generation models, you should consider integrating physics-grounded 3D motion rewards like PhyMotion into your reinforcement learning post-training pipelines. This approach offers superior alignment with human perception of motion realism and provides fine-grained diagnostic signals, leading to more physically plausible and natural human movements in generated videos compared to relying solely on 2D perceptual metrics.

Key insights

Physics-grounded 3D motion rewards significantly enhance human motion realism in video generation by evaluating physical feasibility.

Principles

2D perceptual signals are insufficient for robust human motion realism.
Decomposed physical metrics offer interpretable diagnostic signals.
RL post-training with structured 3D rewards improves motion quality.

Method

PhyMotion recovers SMPL meshes from videos, retargets them to a MuJoCo humanoid, and evaluates kinematic, contact/balance, and dynamic feasibility to generate a structured reward for RL post-training.

In practice

Use SMPL-X and GVHMR for 3D human trajectory recovery.
Integrate MuJoCo for physics-grounded motion simulation.
Combine kinematic, contact, and dynamic scores for comprehensive reward.

Topics

PhyMotion
3D Motion Reward
Physics-Grounded Evaluation
Reinforcement Learning Post-training
Human Video Generation

Best for: Research Scientist, AI Scientist, Machine Learning Engineer, Computer Vision Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CV updates on arXiv.org.