LiteGUI: Distilling Compact GUI Agents with Reinforcement Learning

· Source: Artificial Intelligence · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Robotics & Autonomous Systems · Depth: Expert, quick

Summary

LiteGUI introduces a novel SFT-free training paradigm for developing lightweight, on-device vision-language GUI agents, addressing limitations of current small-scale models like overfitting and policy rigidity. The method integrates generalized knowledge distillation into the GUI agent domain through Guided On-policy Distillation, utilizing oracle reference trajectories and a dynamic retrieval mechanism to reduce hallucinations and cognitive misalignment in multi-solution tasks. Furthermore, LiteGUI incorporates a Multi-solution Dual-level GRPO framework that aligns macro-level subtask planning with micro-level execution matching, enhancing exploration in long-horizon GUI scenarios. An automated data generation pipeline synthesizes GUI task trajectories with multi-solution annotations. Experiments demonstrate that LiteGUI achieves state-of-the-art performance for lightweight models, competing with larger models and unlocking capabilities of 2B/3B scale agents beyond conventional imitation learning.

Key takeaway

For AI Engineers developing on-device vision-language GUI agents, LiteGUI's SFT-free training paradigm offers a path to significantly improve performance without increasing model size. You should consider integrating Guided On-policy Distillation and the Multi-solution Dual-level GRPO framework to overcome limitations like overfitting and enhance exploration in complex, multi-solution GUI tasks, potentially unlocking greater capabilities from 2B/3B scale agents.

Key insights

A novel SFT-free training paradigm enhances lightweight GUI agents via guided distillation and dual-level reinforcement learning.

Principles

Method

LiteGUI employs Guided On-policy Distillation with oracle trajectories and dynamic retrieval, combined with a Multi-solution Dual-level GRPO framework for macro-level planning and micro-level execution alignment.

In practice

Topics

Best for: AI Engineer, Research Scientist, AI Scientist, Machine Learning Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.