Policy-as-Data: Learning Generalizable HOI Diffusion Models from Simulated Physics

2026-06-22 · Source: Takara TLDR - Daily AI Papers · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Robotics & Autonomous Systems, Emerging Technologies & Innovation · Depth: Expert, quick

Summary

The "Policy-as-Data" framework introduces a novel approach for synthesizing realistic Human-Object Interactions (HOI), addressing the limitations of expensive and functionally restricted motion capture datasets. Current data-driven methods often fail to generalize to unseen objects and lack physical consistency over long durations. This new pipeline leverages a physics simulator to train policies using reinforcement learning, generating task-oriented HOI data. A generative model is subsequently trained on this augmented synthetic dataset. To bridge the representation gap between the simulator's simplified models and standard parametric body models, the framework incorporates a coarse-to-fine retargeting process. Comprehensive experiments validate that this method enhances generalization to unseen objects, enables long-horizon generation, and exhibits greater dynamic diversity and physical plausibility.

Key takeaway

For machine learning engineers developing embodied avatars or functional virtual environments, this research offers a path to overcome HOI data limitations. You should consider integrating physics-based simulation and reinforcement learning into your data generation pipelines. This approach can significantly improve your models' generalization to new objects and ensure greater physical consistency and dynamic diversity in long-horizon HOI synthesis, reducing reliance on expensive motion capture data.

Key insights

Leveraging physics simulation and RL for data generation improves HOI model generalization and physical consistency.

Principles

Physics simulators can overcome HOI data scarcity.
Bridging representation gaps is crucial for synthetic data integration.
RL-trained policies enable task-oriented data generation.

Method

The "Policy-as-Data" pipeline trains RL policies in a physics simulator for task-oriented HOI data, then trains a generative model on this data, using coarse-to-fine retargeting to align representations.

In practice

Use physics simulators to augment HOI datasets.
Implement coarse-to-fine retargeting for model compatibility.
Train generative models on RL-generated synthetic data.

Topics

Human-Object Interaction
Diffusion Models
Physics Simulation
Reinforcement Learning
Generative Models
Data Augmentation

Best for: Research Scientist, AI Scientist, Machine Learning Engineer, Robotics Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Takara TLDR - Daily AI Papers.