MUA: Mobile Ultra-detailed Animatable Avatars

2026-04-20 · Source: Takara TLDR - Daily AI Papers · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Computer Graphics & Vision, Gaming & Interactive Media · Depth: Expert, quick

Summary

Heming Zhu, Guoxing Sun, and Marc Habermann introduce MUA (Mobile Ultra-detailed Animatable Avatars), a novel representation and distillation pipeline designed to create photorealistic, animatable full-body digital humans for resource-constrained platforms. Existing methods either achieve high fidelity with substantial server-class GPU computation or are lightweight but lack detail and suffer from artifacts. MUA bridges this gap by employing Wavelet-guided Multi-level Spatial Factorized Blendshapes, which transfers motion-aware clothing dynamics and fine-grained appearance from a high-quality teacher model into a compact, efficient representation. This approach achieves up to 2000X lower computational cost and a 10X smaller model size compared to the teacher model, while maintaining visually plausible dynamics and appearance. MUA demonstrates over 180 FPS on a desktop PC and 24 FPS natively on a Meta Quest 3.

Key takeaway

For developers building immersive applications requiring high-fidelity digital humans on mobile VR/AR platforms, MUA offers a significant advancement. You can now deploy visually rich, animatable avatars with real-time performance on devices like the Meta Quest 3, overcoming previous computational and size constraints. This enables more engaging and realistic user experiences in your next-generation applications.

Key insights

MUA enables high-fidelity, animatable avatars on mobile devices by distilling complex dynamics into an efficient, wavelet-guided representation.

Principles

Combine multi-level wavelet decomposition with low-rank factorization.
Distill high-quality avatar details into compact representations.

Method

The method uses Wavelet-guided Multi-level Spatial Factorized Blendshapes, coupled with a distillation pipeline, to transfer motion-aware clothing dynamics and fine-grained appearance details from a pre-trained ultra-high-quality avatar model.

In practice

Achieves 2000X lower computational cost.
Enables 24 FPS on Meta Quest 3.
Reduces model size by 10X.

Topics

Animatable Avatars
Digital Humans
Wavelet-guided Blendshapes
Model Distillation
Computational Efficiency

Best for: Research Scientist, AI Scientist, Computer Vision Engineer, AI Hardware Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Takara TLDR - Daily AI Papers.