Building real-world on-device AI with LiteRT and NPU

2026-04-23 · Source: Google Developers Blog - AI · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Internet of Things (IoT) & Connected Devices, Robotics & Autonomous Systems · Depth: Intermediate, medium

Summary

LiteRT is a cross-platform framework enabling on-device AI acceleration across CPU, GPU, and Neural Processing Units (NPUs) for mobile, desktop, and IoT platforms. Released on April 23, 2026, LiteRT simplifies deploying high-speed AI features by abstracting NPU SDK complexities through a unified API. Google Meet uses LiteRT to deploy an Ultra-HD segmentation model 25x larger without sacrificing inference speed or power. Epic Games' Live Link Face app achieves 30 FPS real-time MetaHuman facial animation on Android devices. Argmax Inc. leverages LiteRT for its Pro SDK, delivering over 2x speedup from GPU to NPU for speech recognition and supporting models like NVIDIA Parakeet TDT 0.6B v2. The Google AI Edge Gallery App now supports NPU acceleration for select Gemma models and includes benchmarking tools.

Key takeaway

For AI Architects and Computer Vision Engineers building on-device applications, LiteRT offers a critical solution for leveraging NPUs without vendor-specific code. You can deploy advanced models like Gemma 4 across mobile, desktop, and IoT, ensuring high performance and power efficiency. Explore the LiteRT documentation and GitHub repos to integrate NPU acceleration and validate performance using the Google AI Edge Gallery App and Portal.

Key insights

LiteRT enables efficient, cross-platform on-device AI by unifying NPU, GPU, and CPU acceleration.

Principles

Abstract hardware complexity for AI deployment.
Prioritize power efficiency for sustained on-device AI.
Optimize model delivery for device-specific NPUs.

Method

LiteRT provides a unified API for CPU, GPU, and NPU acceleration, supporting both Just-In-Time (JIT) and Ahead-Of-Time (AOT) compilation to streamline on-device AI deployment across diverse hardware.

In practice

Use LiteRT for real-time video effects.
Deploy large models without performance loss.
Achieve high FPS for real-time animation.

Topics

LiteRT Framework
Neural Processing Units
On-Device AI
Cross-Platform Acceleration
Real-time AI Applications

Code references

Best for: Computer Vision Engineer, AI Architect, NLP Engineer, AI Engineer, Machine Learning Engineer, AI Hardware Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Google Developers Blog - AI.