Learn where to Click from Yourself: On-Policy Self-Distillation for GUI Grounding

· Source: Artificial Intelligence · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Robotics & Autonomous Systems, Software Development & Engineering · Depth: Expert, quick

Summary

GUI-SD is a novel on-policy self-distillation (OPSD) framework designed for Graphical User Interface (GUI) grounding, a task that maps natural language instructions to visual coordinates of target elements. This framework addresses limitations of existing reinforcement learning methods like GRPO, which require multiple expensive rollouts and struggle with sparse signals on difficult samples. GUI-SD enhances teacher guidance by constructing a visually enriched privileged context using a target bounding box and a Gaussian soft mask, providing informative cues without revealing exact coordinates. It also incorporates entropy-guided distillation, adaptively weighting tokens based on digit significance and teacher confidence to focus optimization on critical and reliable positions. Experiments across six GUI grounding benchmarks demonstrate that GUI-SD consistently surpasses GRPO-based methods and naive OPSD in both accuracy and training efficiency.

Key takeaway

For research scientists developing autonomous GUI agents, GUI-SD offers a more efficient and accurate approach to GUI grounding than traditional GRPO methods. You should consider integrating on-policy self-distillation with visually enriched contexts and entropy-guided token weighting to improve model performance and training efficiency, especially when dealing with sparse reward signals.

Key insights

GUI-SD improves GUI grounding via on-policy self-distillation with visually enriched context and entropy-guided token weighting.

Principles

Method

GUI-SD constructs a privileged teacher context with a target bounding box and Gaussian soft mask, then uses entropy-guided distillation to adaptively weight tokens based on digit significance and teacher confidence.

In practice

Topics

Best for: Research Scientist, AI Scientist, Machine Learning Engineer, Robotics Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.