In-Context Model Predictive Generation: Open-Vocabulary Motion Synthesis from Language Models to Physics

2026-06-25 · Source: Artificial Intelligence · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Robotics & Autonomous Systems · Depth: Expert, quick

Summary

The In-Context Model Predictive Generation (ICMPG) framework addresses the persistent trade-off between semantic fidelity and physical realism in human motion synthesis from textual descriptions. Proposed to integrate language model planning with inference-time physical feedback, ICMPG reformulates motion synthesis as a Model Predictive Control (MPC)-like process. It comprises two modules: the Context-Aware Motion Generation (CAMG) module, which uses an LLM to plan and generate candidate motion sequences, and the Model Predictive Generation (MPG) module, which evaluates these candidates through physical simulation and semantic alignment to select the best sequence. This closed-loop refinement allows ICMPG to adapt motions to both input semantics and the simulated physical environment without task-specific policy retraining. Experiments demonstrate that ICMPG generalizes robustly to diverse open-vocabulary commands, producing motions that are more physically plausible and semantically faithful than representative baselines.

Key takeaway

For Robotics Engineers developing text-to-motion systems, ICMPG offers a critical solution to the semantic fidelity versus physical realism dilemma. You should consider integrating LLM-based planning with inference-time physical simulation to achieve more robust and physically plausible motion synthesis from open-vocabulary commands. This framework enables adapting motions to environmental physics without extensive policy retraining, significantly improving the realism and semantic faithfulness of generated actions for immersive applications.

Key insights

ICMPG integrates LLM planning with physical simulation for robust, realistic, and semantically faithful open-vocabulary motion synthesis.

Principles

Closed-loop physical feedback enhances LLM-driven generation.
Decompose complex commands into motion tokens for planning.
Combine semantic alignment with physical simulation for evaluation.

Method

ICMPG uses an LLM (CAMG) to plan and generate candidate motion sequences from text. An MPG module then evaluates these candidates via physical simulation and semantic alignment, selecting the best for subsequent steps.

In practice

Synthesize human motion for immersive digital applications.
Generate physically plausible motions from diverse text commands.
Incorporate different LLM backbones for versatility.

Topics

In-Context Model Predictive Generation
Motion Synthesis
Large Language Models
Model Predictive Control
Physical Simulation
Open-Vocabulary Generation

Best for: Research Scientist, AI Scientist, Robotics Engineer, Machine Learning Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.