Ferret-UI Lite: Lessons from Building Small On-Device GUI Agents
Summary
Ferret-UI Lite is a compact, end-to-end GUI agent designed for on-device operation across mobile, web, and desktop platforms, developed by Zhen Yang et al. and published in February 2026. This 3-billion-parameter model was built using techniques optimized for small models, including curating diverse GUI data from real and synthetic sources, enhancing inference-time performance via chain-of-thought reasoning and visual tool-use, and applying reinforcement learning with custom rewards. Ferret-UI Lite demonstrates competitive performance among small-scale GUI agents, achieving 91.6% on ScreenSpot-V2, 53.3% on ScreenSpot-Pro, and 61.2% on OSWorld-G for GUI grounding. For GUI navigation, it reached success rates of 28.0% on AndroidWorld and 19.8% on OSWorld. This work shares methods and lessons from developing compact, on-device GUI agents.
Key takeaway
For AI Scientists developing on-device GUI agents, Ferret-UI Lite demonstrates that competitive performance is achievable with compact models. You should consider integrating diverse data mixtures, chain-of-thought reasoning, and visual tool-use, alongside reinforcement learning, to optimize your agent's efficiency and accuracy on resource-constrained platforms.
Key insights
Ferret-UI Lite is a compact, end-to-end GUI agent optimized for on-device performance across diverse platforms.
Principles
- Diverse data improves GUI agent performance.
- Chain-of-thought enhances inference for small models.
- Reinforcement learning refines agent behavior.
Method
The Ferret-UI Lite agent was built by curating diverse GUI data, strengthening inference with chain-of-thought and visual tool-use, and applying reinforcement learning with designed rewards.
In practice
- Use mixed real and synthetic GUI data.
- Implement visual tool-use for GUI tasks.
- Apply RL for on-device agent optimization.
Topics
- On-Device GUI Agents
- User Interface Understanding
- Reinforcement Learning
- Chain-of-Thought Reasoning
- Multimodal LLMs
Best for: AI Scientist, Research Scientist, AI Researcher, AI Engineer, Machine Learning Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Apple Machine Learning Research.