Beyond the GUI Paradigm: Do Mobile Agents Need the Phone Screen?
Summary
A new analysis argues that mobile agents should consider the command-line interface (CLI) as a primary interaction method, alongside the dominant graphical user interface (GUI) paradigm. Researchers evaluated three coding agents (Claude Code, Terminus-2, mini-swe-agent) across four model APIs on AndroidWorld and MobileWorld benchmarks. Claude Code (Opus 4.7) achieved 71.8% on AndroidWorld and 51.9% on MobileWorld, surpassing all reproducible GUI baselines (e.g., 69.3% on AndroidWorld, 43.2% on MobileWorld). Oracle CLI solutions demonstrated a ceiling of 88.8% on AndroidWorld (103/116 tasks) and 86.3% on MobileWorld (101/117 tasks), indicating significant improvement potential. A new CLI-Advantage Task Suite, with 45 templates across five categories like bulk operations and cross-app workflows, showed CLI agents outperforming GUI baselines in all categories with substantially fewer steps (10.7 vs. 18.6). The team will open-source their implementations and evaluation infrastructure.
Key takeaway
For AI Engineers developing mobile agents, you should seriously consider integrating command-line interface (CLI) capabilities. CLI agents demonstrate superior performance and efficiency for complex tasks, outperforming GUI-based approaches on benchmarks like AndroidWorld and MobileWorld. Your designs can achieve higher task completion rates and fewer interaction steps by leveraging direct device service access. Explore the open-sourced CLI-Advantage Task Suite to develop agents capable of bulk operations, multi-condition filtering, and cross-app workflows that are difficult with GUI alone.
Key insights
Mobile agents using CLI significantly outperform GUI-based agents, offering direct access and efficiency for complex tasks.
Principles
- CLI offers direct access to device services.
- CLI agents can handle tasks beyond GUI scope.
- Efficiency gains from fewer interaction steps.
Method
Evaluated three coding agents (Claude Code, Terminus-2, mini-swe-agent) on AndroidWorld and MobileWorld using CLI, comparing against three GUI baselines. Introduced CLI-Advantage Task Suite for complex, non-GUI tasks.
In practice
- Explore CLI for mobile agent development.
- Utilize CLI for bulk operations and cross-app workflows.
- Use the open-sourced CLI-Advantage suite.
Topics
- Mobile Agents
- Command-Line Interface
- GUI Paradigm
- AndroidWorld Benchmark
- CLI-Advantage Suite
- Agent Evaluation
Best for: Research Scientist, AI Scientist, Machine Learning Engineer, AI Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by cs.SE updates on arXiv.org.