TLMs: Tiny LLMs and Agents on Edge Devices with LiteRT-LM — Cormac Brick, Google

· Source: AI Engineer · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Internet of Things (IoT) & Connected Devices, Robotics & Autonomous Systems · Depth: Intermediate, extended

Summary

Google AI Edge is actively developing and deploying AI models on edge devices, focusing on tiny LLMs and agent skills. Cormbrick, a tech lead at Google, detailed the benefits of edge AI, including reduced latency, enhanced privacy, offline functionality, and cost savings. The team utilizes tools like MediaPipe, LiteRT-LM, and LiteRT (formerly TensorFlow Lite) to enable cross-platform deployment of models on Android, iOS, macOS, Linux, Windows, web, and IoT devices. Recently, Gemma 4 models (E2B and E4B, optimized for 2 billion and 4 billion parameters respectively) were launched, featuring built-in function calling and thinking capabilities, which enable on-device agent skills. These models support multimodal inputs (audio, image, text) and are released under an Apache 2.0 license. Google AI Gallery, an open-source app, showcases these agent skills, allowing dynamic tool access and integration of domain-specific knowledge bases. The presentation also covered the workflow for tiny models (under 1 billion parameters) for in-app deployment, emphasizing fine-tuning with synthetic data and NPU optimization.

Key takeaway

For AI Architects and NLP Engineers evaluating on-device AI solutions, the advancements in Gemma 4 and tiny LLMs present compelling opportunities. Your teams should consider leveraging the Apache 2.0 licensed Gemma 4 models for system-level GenAI and exploring fine-tuned tiny models for in-app features requiring privacy, low latency, or offline capability. Utilize the Google AI Gallery app and its open-source infrastructure for prototyping and deploying custom agent skills, especially for robotics or IoT platforms where hot-swapping LoRAs can optimize memory usage.

Key insights

Edge AI enables low-latency, private, and cost-effective on-device model deployment, leveraging tiny LLMs and agent skills.

Principles

Method

Agent skills are built with on-demand loaded instructions and dynamic tool access, using a system prompt, skill descriptions, and JavaScript for execution. Tiny models are fine-tuned with synthetic data generated by larger cloud LLMs.

In practice

Topics

Best for: AI Architect, NLP Engineer, CTO, AI Engineer, Machine Learning Engineer, AI Hardware Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by AI Engineer.