LiteGUI: Distilling Compact GUI Agents with Reinforcement Learning
Summary
LiteGUI introduces a novel SFT-free training paradigm for developing lightweight, on-device vision-language GUI agents, addressing limitations of current small-scale models like overfitting and policy rigidity. The method integrates generalized knowledge distillation into the GUI agent domain through Guided On-policy Distillation, utilizing oracle reference trajectories and a dynamic retrieval mechanism to reduce hallucinations and cognitive misalignment in multi-solution tasks. Furthermore, LiteGUI incorporates a Multi-solution Dual-level GRPO framework that aligns macro-level subtask planning with micro-level execution matching, enhancing exploration in long-horizon GUI scenarios. An automated data generation pipeline synthesizes GUI task trajectories with multi-solution annotations. Experiments demonstrate that LiteGUI achieves state-of-the-art performance for lightweight models, competing with larger models and unlocking capabilities of 2B/3B scale agents beyond conventional imitation learning.
Key takeaway
For AI Engineers developing on-device vision-language GUI agents, LiteGUI's SFT-free training paradigm offers a path to significantly improve performance without increasing model size. You should consider integrating Guided On-policy Distillation and the Multi-solution Dual-level GRPO framework to overcome limitations like overfitting and enhance exploration in complex, multi-solution GUI tasks, potentially unlocking greater capabilities from 2B/3B scale agents.
Key insights
A novel SFT-free training paradigm enhances lightweight GUI agents via guided distillation and dual-level reinforcement learning.
Principles
- On-policy distillation reduces hallucinations.
- Dual-level alignment improves long-horizon exploration.
- Automated data generation enriches training.
Method
LiteGUI employs Guided On-policy Distillation with oracle trajectories and dynamic retrieval, combined with a Multi-solution Dual-level GRPO framework for macro-level planning and micro-level execution alignment.
In practice
- Apply Guided On-policy Distillation for small models.
- Use dual-level GRPO for complex GUI tasks.
- Synthesize multi-solution GUI task data.
Topics
- LiteGUI
- GUI Agents
- Knowledge Distillation
- Reinforcement Learning
- On-device AI
Best for: AI Engineer, Research Scientist, AI Scientist, Machine Learning Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.