LPM 1.0: Video-based Character Performance Model
Summary
LPM 1.0 (Large Performance Model) is a novel video-based character performance model designed for single-person, full-duplex audio-visual conversational performance. It addresses the "performance trilemma" of achieving high expressiveness, real-time inference, and long-horizon identity stability simultaneously. The system comprises a multimodal human-centric dataset, a 17B-parameter Diffusion Transformer ("Base LPM") for controllable, identity-consistent performance via multimodal conditioning, and a distilled causal streaming generator ("Online LPM") for low-latency, infinite-length interaction. LPM 1.0 generates listening and speaking videos from user audio and synthesized audio, respectively, with text prompts for motion control, all at real-time speed. It also introduces LPM-Bench, the first benchmark for interactive character performance, where LPM 1.0 achieves state-of-the-art results.
Key takeaway
For research scientists developing interactive character systems, LPM 1.0 demonstrates that high-quality, full-duplex conversational performance is practical under deployable latency and stability constraints. You should consider its systems-level co-design approach, integrating data, multimodal conditioning, and streaming, rather than focusing solely on model architecture, to overcome the expressiveness, real-time, and stability trade-offs in your own projects.
Key insights
LPM 1.0 resolves the "performance trilemma" for conversational AI by integrating data, multimodal conditioning, and streaming for real-time, expressive, and stable character video.
Principles
- Conversation is a performance.
- Performance requires identity stability.
- Systems-level co-design resolves trade-offs.
Method
LPM 1.0 uses a multimodal dataset, trains a 17B-parameter Diffusion Transformer for controllable performance, and distills it into a causal streaming generator for real-time, infinite-length interaction.
In practice
- Use LPM 1.0 for conversational agents.
- Apply to live streaming characters.
- Integrate into game NPCs.
Topics
- LPM 1.0
- Video-based Character Performance
- Performance Trilemma
- Diffusion Transformer
- Full-duplex Conversational AI
Best for: Research Scientist, AI Scientist, Machine Learning Engineer, Computer Vision Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.