ProRe: A Proactive Reward System for GUI Agents via Reasoner-Actor Collaboration

· Source: cs.AI updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Robotics & Autonomous Systems · Depth: Expert, extended

Summary

Microsoft Research and Tsinghua University researchers have introduced ProRe, a proactive reward system designed to improve the accuracy and scalability of reward mechanisms for Large Language Model (LLM)-based Graphical User Interface (GUI) agents. Existing methods, such as rule-based systems and LLM-as-a-Judge approaches, struggle with GUI agents due to incomplete state observability and a lack of domain-specific knowledge. ProRe addresses these issues by employing a general-purpose reasoner (e.g., GPT-4o) that schedules targeted state probing tasks, which are then executed by domain-specific evaluator agents. These evaluators actively interact with the environment to collect additional, verifiable observations. Empirical results from over 3,000 trajectories across benchmarks like AndroidWorld and MobileAgentBench show ProRe improves reward accuracy by up to 5.3% (achieving an average of 93.7%) and F1 score by up to 19.4%. Furthermore, integrating ProRe with policy agents boosts their success rate by up to 22.4%.

Key takeaway

Research Scientists developing or deploying LLM-based GUI agents should consider integrating ProRe to significantly enhance reward accuracy and agent success rates. By adopting its proactive reasoner-actor collaboration model, you can overcome limitations of passive observation and domain-specific knowledge gaps, leading to more reliable agent training and test-time scaling. This approach offers a cost-efficient alternative to manual annotation, improving overall system performance and generalizability.

Key insights

ProRe enhances GUI agent reward accuracy by proactively probing states using a reasoner-evaluator collaboration.

Principles

Method

A general LLM reasoner schedules state probing tasks for domain-specific evaluator agents. Evaluators interact with the environment to collect key states, which are then summarized into claims. The reasoner performs chain-of-claims reasoning to assign rewards.

In practice

Topics

Best for: Research Scientist, AI Scientist, Machine Learning Engineer, AI Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.