TLMs: Tiny LLMs and Agents on Edge Devices with LiteRT-LM — Cormac Brick, Google
Summary
Google AI Edge is actively developing and deploying AI models on edge devices, focusing on tiny LLMs and agent skills. Cormbrick, a tech lead at Google, detailed the benefits of edge AI, including reduced latency, enhanced privacy, offline functionality, and cost savings. The team utilizes tools like MediaPipe, LiteRT-LM, and LiteRT (formerly TensorFlow Lite) to enable cross-platform deployment of models on Android, iOS, macOS, Linux, Windows, web, and IoT devices. Recently, Gemma 4 models (E2B and E4B, optimized for 2 billion and 4 billion parameters respectively) were launched, featuring built-in function calling and thinking capabilities, which enable on-device agent skills. These models support multimodal inputs (audio, image, text) and are released under an Apache 2.0 license. Google AI Gallery, an open-source app, showcases these agent skills, allowing dynamic tool access and integration of domain-specific knowledge bases. The presentation also covered the workflow for tiny models (under 1 billion parameters) for in-app deployment, emphasizing fine-tuning with synthetic data and NPU optimization.
Key takeaway
For AI Architects and NLP Engineers evaluating on-device AI solutions, the advancements in Gemma 4 and tiny LLMs present compelling opportunities. Your teams should consider leveraging the Apache 2.0 licensed Gemma 4 models for system-level GenAI and exploring fine-tuned tiny models for in-app features requiring privacy, low latency, or offline capability. Utilize the Google AI Gallery app and its open-source infrastructure for prototyping and deploying custom agent skills, especially for robotics or IoT platforms where hot-swapping LoRAs can optimize memory usage.
Key insights
Edge AI enables low-latency, private, and cost-effective on-device model deployment, leveraging tiny LLMs and agent skills.
Principles
- Prioritize token efficiency for edge models.
- Modularity in tiny models optimizes resource use.
- Fine-tuning significantly boosts tiny model reliability.
Method
Agent skills are built with on-demand loaded instructions and dynamic tool access, using a system prompt, skill descriptions, and JavaScript for execution. Tiny models are fine-tuned with synthetic data generated by larger cloud LLMs.
In practice
- Use Gemma 4 E2B/E4B for on-device agent skills.
- Explore Google AI Gallery for skill prototyping.
- Fine-tune tiny LLMs for specific in-app tasks.
Topics
- Google AI Edge
- LiteRT-LM Runtime
- Gemma Models
- Edge AI Deployment
- Agent Skills
Best for: AI Architect, NLP Engineer, CTO, AI Engineer, Machine Learning Engineer, AI Hardware Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by AI Engineer.