MAS-Bench: A Unified Benchmark for Shortcut-Augmented Hybrid Mobile GUI Agents
Summary
MAS-Bench is a new benchmark designed to evaluate GUI-shortcut hybrid agents, specifically focusing on the mobile domain. It addresses the gap in systematically benchmarking agents that combine flexible Graphical User Interface (GUI) operations with efficient shortcuts like APIs, deep links, and Robotic Process Automation (RPA) scripts. The benchmark features 139 complex tasks across 11 real-world Android applications and includes a knowledge base of 88 predefined shortcuts. Beyond evaluating the use of predefined shortcuts, MAS-Bench assesses an agent's ability to autonomously generate new, reusable workflows. Experiments using the Gemini-2.5-Pro model show that hybrid agents achieve a 64.1% success rate, significantly outperforming GUI-only agents (44.6%), and demonstrate over 40% greater efficiency. The benchmark also reveals a performance gap between robust predefined shortcuts and less reliable agent-generated ones, highlighting future research areas.
Key takeaway
For Research Scientists developing mobile GUI agents, you should prioritize integrating hybrid GUI-shortcut operations to significantly improve task success rates and operational efficiency. Your focus should extend beyond merely utilizing existing shortcuts to developing robust frameworks for autonomously generating new, efficient shortcuts, especially for repetitive sub-tasks. This approach will lead to more capable and adaptable agents, particularly benefiting less powerful base models.
Key insights
Hybrid GUI-shortcut agents significantly boost mobile task success and efficiency over GUI-only approaches.
Principles
- Predefined shortcuts enhance agent performance across diverse frameworks.
- Weaker base models gain more from shortcut integration.
- Agent-generated shortcuts require improved robustness and efficiency.
Method
MAS-Bench evaluates agents in a dynamic Android environment using 139 tasks, 88 predefined shortcuts (APIs, deep links, RPA scripts), and 7 metrics, including a framework for assessing agent-generated shortcut quality.
In practice
- Integrate APIs, deep links, or RPA scripts for mobile task automation.
- Prioritize robust, predefined shortcuts for critical workflows.
- Focus on improving dynamic shortcut generation for adaptability.
Topics
- MAS-Bench
- GUI-Shortcut Hybrid Agents
- Mobile GUI Automation
- Shortcut Generation
- Performance Benchmarking
Best for: Research Scientist, AI Scientist, Machine Learning Engineer, Robotics Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.