OS-Themis: A Scalable Critic Framework for Generalist GUI Rewards
Summary
OS-Themis is a novel multi-agent critic framework designed to enhance the robustness of GUI agents in stochastic environments by improving reward function quality for Reinforcement Learning (RL). It addresses the limitations of existing reward approaches by decomposing trajectories into verifiable milestones and employing a strict review mechanism for evidence chains before issuing a final verdict. The framework is scalable and accurate, achieving its best performance when evaluated with the new OmniGUIRewardBench (OGRBench), a cross-platform benchmark for GUI outcome rewards. Experiments on AndroidWorld demonstrate that OS-Themis provides a 10.3% improvement in online RL training and a 6.9% gain in self-training loops for trajectory validation and filtering.
Key takeaway
For AI Scientists and Research Scientists developing GUI agents, OS-Themis offers a significant advancement in reward function quality, directly impacting agent robustness and training efficiency. Your RL training pipelines could see substantial performance gains, with reported improvements of 10.3% in online training and 6.9% in self-training loops. Consider integrating OS-Themis to validate trajectories and refine reward signals, accelerating agent evolution and enhancing reliability in stochastic GUI environments.
Key insights
OS-Themis is a scalable multi-agent critic framework that improves GUI agent RL training through milestone-based reward decomposition and strict evidence auditing.
Principles
- Decompose complex trajectories into verifiable milestones.
- Strictly audit evidence chains for robust decision-making.
Method
OS-Themis decomposes GUI agent trajectories into verifiable milestones, isolates critical evidence, and uses a multi-agent review mechanism to audit the evidence chain before rendering a final reward verdict.
In practice
- Use OS-Themis for online RL training to boost performance.
- Apply OS-Themis for trajectory validation in self-training loops.
Topics
- Reinforcement Learning
- GUI Agents
- Reward Functions
- Multi-agent Systems
- Benchmarking
Best for: AI Scientist, Research Scientist, AI Researcher, AI Engineer, Machine Learning Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.