GUI-Libra: Training Native GUI Agents to Reason and Act with Action-aware Supervision and Partially Verifiable RL
Summary
GUI-Libra is a new training recipe designed to improve the performance of open-source native GUI agents on long-horizon navigation tasks, addressing limitations in data quality and generic post-training pipelines. The system tackles two core issues: the scarcity of high-quality, action-aligned reasoning data and the partial verifiability inherent in step-wise Reinforcement Learning with Verification (RLVR) training. GUI-Libra introduces an 81K GUI reasoning dataset, constructed and filtered to ensure action alignment. It also proposes action-aware Supervised Fine-Tuning (SFT) that blends reasoning-then-action and direct-action data, reweighting tokens to prioritize action and grounding. Furthermore, it stabilizes RL under partial verifiability by emphasizing KL regularization within a trust region and implementing success-adaptive scaling to manage negative gradients. This approach consistently enhances both step-wise accuracy and end-to-end task completion across various web and mobile benchmarks.
Key takeaway
For research scientists developing open-source native GUI agents, GUI-Libra demonstrates that carefully designed post-training and data curation can significantly enhance task-solving capabilities without expensive online data collection. You should consider adopting action-aware SFT and KL-regularized RLVR techniques to improve both step-wise accuracy and end-to-end task completion in your agent development, especially when dealing with long-horizon navigation tasks and partially verifiable environments.
Key insights
Tailored training and data curation significantly boost GUI agent performance on complex navigation tasks.
Principles
- Action-aligned data is crucial for GUI agent reasoning.
- KL regularization stabilizes RL under partial verifiability.
- Mixing reasoning and direct action improves grounding.
Method
GUI-Libra uses action-aware SFT with mixed data and token reweighting, combined with KL-regularized RLVR and success-adaptive scaling, to train GUI agents.
In practice
- Curate action-aligned reasoning datasets.
- Implement action-aware SFT for GUI agents.
- Apply KL trust regions in RL for partial verifiability.
Topics
- GUI Agents
- Reinforcement Learning
- Supervised Fine-tuning
- Data Curation
- Action-aware SFT
Best for: Research Scientist, AI Researcher, AI Scientist, Machine Learning Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Machine Learning.