Building real-world on-device AI with LiteRT and NPU
Summary
LiteRT is a cross-platform framework enabling on-device AI acceleration across CPU, GPU, and Neural Processing Units (NPUs) for mobile, desktop, and IoT platforms. Released on April 23, 2026, LiteRT simplifies deploying high-speed AI features by abstracting NPU SDK complexities through a unified API. Google Meet uses LiteRT to deploy an Ultra-HD segmentation model 25x larger without sacrificing inference speed or power. Epic Games' Live Link Face app achieves 30 FPS real-time MetaHuman facial animation on Android devices. Argmax Inc. leverages LiteRT for its Pro SDK, delivering over 2x speedup from GPU to NPU for speech recognition and supporting models like NVIDIA Parakeet TDT 0.6B v2. The Google AI Edge Gallery App now supports NPU acceleration for select Gemma models and includes benchmarking tools.
Key takeaway
For AI Architects and Computer Vision Engineers building on-device applications, LiteRT offers a critical solution for leveraging NPUs without vendor-specific code. You can deploy advanced models like Gemma 4 across mobile, desktop, and IoT, ensuring high performance and power efficiency. Explore the LiteRT documentation and GitHub repos to integrate NPU acceleration and validate performance using the Google AI Edge Gallery App and Portal.
Key insights
LiteRT enables efficient, cross-platform on-device AI by unifying NPU, GPU, and CPU acceleration.
Principles
- Abstract hardware complexity for AI deployment.
- Prioritize power efficiency for sustained on-device AI.
- Optimize model delivery for device-specific NPUs.
Method
LiteRT provides a unified API for CPU, GPU, and NPU acceleration, supporting both Just-In-Time (JIT) and Ahead-Of-Time (AOT) compilation to streamline on-device AI deployment across diverse hardware.
In practice
- Use LiteRT for real-time video effects.
- Deploy large models without performance loss.
- Achieve high FPS for real-time animation.
Topics
- LiteRT Framework
- Neural Processing Units
- On-Device AI
- Cross-Platform Acceleration
- Real-time AI Applications
Code references
- google-ai-edge/gallery
- google-ai-edge/litert
- google-ai-edge/LiteRT-LM
- google-ai-edge/litert-samples
- google-ai-edge/LiteRT
Best for: Computer Vision Engineer, AI Architect, NLP Engineer, AI Engineer, Machine Learning Engineer, AI Hardware Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Google Developers Blog - AI.