Large Language Models for Mobile GUI Text Input Generation: An Empirical Study
Summary
An empirical study evaluated the effectiveness of nine Large Language Models (LLMs) for generating text inputs in mobile Graphical User Interface (GUI) testing across 115 real-world applications. Researchers compared three UI-context prompting methods: extracted textual context, UI-hierarchy XML, and screenshot-based vision input. Extracted textual context and XML achieved comparable page-pass-through rates (PPTRs) of 71.4% and 71.0% respectively, while vision-based input reached 65.1% but incurred significantly higher token costs. In bug-detection experiments involving 37 real-world text-input bugs, LLMs generating invalid inputs detected approximately 51% of issues. A feedback-enhanced protocol, which incorporated execution outcomes, further improved average PPTRs to 69.2-73.8% and raised bug-detection rates to 51.0-64.5%. Human intervention provided additional gains, and the process was integrated into DroidBot to augment UI-exploration capabilities.
Key takeaway
For Mobile GUI Test Automation Engineers aiming to improve text input generation, you should integrate Large Language Models into your testing workflows. Prioritize using extracted textual context or UI-hierarchy XML for LLM prompting, as these offer strong performance (71.4% and 71.0% PPTRs) without the high token costs of vision-based methods. Implement a feedback-enhanced protocol to refine LLM-generated inputs based on execution outcomes, potentially boosting bug detection rates to 64.5%. Consider human-in-the-loop refinement for complex scenarios.
Key insights
LLMs effectively generate mobile GUI text inputs for testing, with performance varying by UI context and improving with feedback.
Principles
- UI context representation affects LLM testing efficacy and cost.
- Iterative feedback loops enhance LLM-based test input generation.
- Human oversight improves automated LLM testing outcomes.
Method
An empirical study evaluated nine LLMs using textual context, UI-hierarchy XML, and vision input, then applied a feedback-enhanced protocol incorporating execution outcomes to refine inputs for mobile GUI testing.
In practice
- Prioritize textual context or UI-hierarchy XML for LLM prompting.
- Implement feedback loops from execution results to refine LLM inputs.
- Augment automated LLM testing with human review for critical inputs.
Topics
- Large Language Models
- Mobile GUI Testing
- Text Input Generation
- UI Context
- Automated Software Testing
- DroidBot
Best for: Research Scientist, AI Scientist, AI Engineer, Software Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by cs.SE updates on arXiv.org.