MUA: Mobile Ultra-detailed Animatable Avatars
Summary
Heming Zhu, Guoxing Sun, and Marc Habermann introduce MUA (Mobile Ultra-detailed Animatable Avatars), a novel representation and distillation pipeline designed to create photorealistic, animatable full-body digital humans for resource-constrained platforms. Existing methods either achieve high fidelity with substantial server-class GPU computation or are lightweight but lack detail and suffer from artifacts. MUA bridges this gap by employing Wavelet-guided Multi-level Spatial Factorized Blendshapes, which transfers motion-aware clothing dynamics and fine-grained appearance from a high-quality teacher model into a compact, efficient representation. This approach achieves up to 2000X lower computational cost and a 10X smaller model size compared to the teacher model, while maintaining visually plausible dynamics and appearance. MUA demonstrates over 180 FPS on a desktop PC and 24 FPS natively on a Meta Quest 3.
Key takeaway
For developers building immersive applications requiring high-fidelity digital humans on mobile VR/AR platforms, MUA offers a significant advancement. You can now deploy visually rich, animatable avatars with real-time performance on devices like the Meta Quest 3, overcoming previous computational and size constraints. This enables more engaging and realistic user experiences in your next-generation applications.
Key insights
MUA enables high-fidelity, animatable avatars on mobile devices by distilling complex dynamics into an efficient, wavelet-guided representation.
Principles
- Combine multi-level wavelet decomposition with low-rank factorization.
- Distill high-quality avatar details into compact representations.
Method
The method uses Wavelet-guided Multi-level Spatial Factorized Blendshapes, coupled with a distillation pipeline, to transfer motion-aware clothing dynamics and fine-grained appearance details from a pre-trained ultra-high-quality avatar model.
In practice
- Achieves 2000X lower computational cost.
- Enables 24 FPS on Meta Quest 3.
- Reduces model size by 10X.
Topics
- Animatable Avatars
- Digital Humans
- Wavelet-guided Blendshapes
- Model Distillation
- Computational Efficiency
Best for: Research Scientist, AI Scientist, Computer Vision Engineer, AI Hardware Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Takara TLDR - Daily AI Papers.