In-Context Model Predictive Generation: Open-Vocabulary Motion Synthesis from Language Models to Physics

2026-06-25 · Source: Takara TLDR - Daily AI Papers · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Robotics & Autonomous Systems · Depth: Expert, medium

Summary

In-Context Model Predictive Generation (ICMPG) is a novel framework designed to synthesize human motion from textual descriptions, addressing the long-standing trade-off between semantic fidelity and physical realism. Existing large language model (LLM)-based approaches interpret diverse open-vocabulary instructions but often generate physically implausible motions, while physics-aware models achieve realism but struggle with semantic complexity. ICMPG integrates LLM planning with inference-time physical feedback, reformulating motion synthesis as a Model Predictive Control (MPC)-like process. It comprises two modules: the Context-Aware Motion Generation (CAMG) module, which uses an LLM as a planner to decompose commands and generate candidate motion sequences, and the Model Predictive Generation (MPG) module, which evaluates these candidates through physical simulation and semantic alignment to select the optimal sequence. This closed-loop refinement allows ICMPG to adapt motions to both input semantics and the simulated physical environment without task-specific policy retraining. Experiments demonstrate ICMPG's robust generalization to diverse commands, yielding motions that are more physically plausible and semantically faithful than representative baselines in standard and zero-shot open-vocabulary settings.

Key takeaway

For machine learning engineers developing text-to-motion systems, you should consider integrating large language model planning with real-time physical simulation. This approach, exemplified by ICMPG, allows your models to generate physically plausible and semantically faithful motions from open-vocabulary commands. It avoids costly policy retraining by using inference-time physical feedback, enabling more versatile and controllable motion synthesis for immersive digital applications.

Key insights

ICMPG integrates LLM planning with physical simulation for realistic, semantically faithful open-vocabulary motion synthesis.

Principles

Semantic fidelity and physical realism require distinct approaches.
Closed-loop refinement improves motion adaptation.
LLMs can serve as effective high-level planners.

Method

ICMPG uses an MPC-like process: an LLM (CAMG) plans and generates candidate motion sequences, then a physical simulator (MPG) evaluates and refines them based on composite rewards.

In practice

Combine LLM planning with physics simulation.
Use inference-time physical feedback for refinement.
Adapt LLM backbones for diverse motion tasks.

Topics

Human Motion Synthesis
Large Language Models
Model Predictive Control
Physical Simulation
Open-Vocabulary Generation
Text-to-Motion

Best for: Research Scientist, AI Scientist, Machine Learning Engineer, Robotics Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Takara TLDR - Daily AI Papers.