GUI-Libra: Training Native GUI Agents to Reason and Act with Action-aware Supervision and Partially Verifiable RL

2026-02-25 · Source: Machine Learning · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Robotics & Autonomous Systems · Depth: Expert, quick

Summary

GUI-Libra is a new training recipe designed to improve the performance of open-source native GUI agents on long-horizon navigation tasks, addressing limitations in data quality and generic post-training pipelines. The system tackles two core issues: the scarcity of high-quality, action-aligned reasoning data and the partial verifiability inherent in step-wise Reinforcement Learning with Verification (RLVR) training. GUI-Libra introduces an 81K GUI reasoning dataset, constructed and filtered to ensure action alignment. It also proposes action-aware Supervised Fine-Tuning (SFT) that blends reasoning-then-action and direct-action data, reweighting tokens to prioritize action and grounding. Furthermore, it stabilizes RL under partial verifiability by emphasizing KL regularization within a trust region and implementing success-adaptive scaling to manage negative gradients. This approach consistently enhances both step-wise accuracy and end-to-end task completion across various web and mobile benchmarks.

Key takeaway

For research scientists developing open-source native GUI agents, GUI-Libra demonstrates that carefully designed post-training and data curation can significantly enhance task-solving capabilities without expensive online data collection. You should consider adopting action-aware SFT and KL-regularized RLVR techniques to improve both step-wise accuracy and end-to-end task completion in your agent development, especially when dealing with long-horizon navigation tasks and partially verifiable environments.

Key insights

Tailored training and data curation significantly boost GUI agent performance on complex navigation tasks.

Principles

Action-aligned data is crucial for GUI agent reasoning.
KL regularization stabilizes RL under partial verifiability.
Mixing reasoning and direct action improves grounding.

Method

GUI-Libra uses action-aware SFT with mixed data and token reweighting, combined with KL-regularized RLVR and success-adaptive scaling, to train GUI agents.

In practice

Curate action-aligned reasoning datasets.
Implement action-aware SFT for GUI agents.
Apply KL trust regions in RL for partial verifiability.

Topics

GUI Agents
Reinforcement Learning
Supervised Fine-tuning
Data Curation
Action-aware SFT

Best for: Research Scientist, AI Researcher, AI Scientist, Machine Learning Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Machine Learning.