LPM 1.0: Video-based Character Performance Model

· Source: cs.AI updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Robotics & Autonomous Systems · Depth: Expert, long

Summary

LPM 1.0 (Large Performance Model) is a novel video-based character performance model designed for single-person, full-duplex audio-visual conversational performance. It addresses the "performance trilemma" of achieving high expressiveness, real-time inference, and long-horizon identity stability simultaneously. The system comprises a multimodal human-centric dataset, a 17B-parameter Diffusion Transformer ("Base LPM") for controllable, identity-consistent performance via multimodal conditioning, and a distilled causal streaming generator ("Online LPM") for low-latency, infinite-length interaction. LPM 1.0 generates listening and speaking videos from user audio and synthesized audio, respectively, with text prompts for motion control, all at real-time speed. It also introduces LPM-Bench, the first benchmark for interactive character performance, where LPM 1.0 achieves state-of-the-art results.

Key takeaway

For research scientists developing interactive character systems, LPM 1.0 demonstrates that high-quality, full-duplex conversational performance is practical under deployable latency and stability constraints. You should consider its systems-level co-design approach, integrating data, multimodal conditioning, and streaming, rather than focusing solely on model architecture, to overcome the expressiveness, real-time, and stability trade-offs in your own projects.

Key insights

LPM 1.0 resolves the "performance trilemma" for conversational AI by integrating data, multimodal conditioning, and streaming for real-time, expressive, and stable character video.

Principles

Method

LPM 1.0 uses a multimodal dataset, trains a 17B-parameter Diffusion Transformer for controllable performance, and distills it into a causal streaming generator for real-time, infinite-length interaction.

In practice

Topics

Best for: Research Scientist, AI Scientist, Machine Learning Engineer, Computer Vision Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.