Large Language Models for Mobile GUI Text Input Generation: An Empirical Study

2024-04-13 · Source: cs.SE updates on arXiv.org · Field: Technology & Digital — Software Development & Engineering, Artificial Intelligence & Machine Learning · Depth: Expert, short

Summary

An empirical study evaluated the effectiveness of nine Large Language Models (LLMs) for generating text inputs in mobile Graphical User Interface (GUI) testing across 115 real-world applications. Researchers compared three UI-context prompting methods: extracted textual context, UI-hierarchy XML, and screenshot-based vision input. Extracted textual context and XML achieved comparable page-pass-through rates (PPTRs) of 71.4% and 71.0% respectively, while vision-based input reached 65.1% but incurred significantly higher token costs. In bug-detection experiments involving 37 real-world text-input bugs, LLMs generating invalid inputs detected approximately 51% of issues. A feedback-enhanced protocol, which incorporated execution outcomes, further improved average PPTRs to 69.2-73.8% and raised bug-detection rates to 51.0-64.5%. Human intervention provided additional gains, and the process was integrated into DroidBot to augment UI-exploration capabilities.

Key takeaway

For Mobile GUI Test Automation Engineers aiming to improve text input generation, you should integrate Large Language Models into your testing workflows. Prioritize using extracted textual context or UI-hierarchy XML for LLM prompting, as these offer strong performance (71.4% and 71.0% PPTRs) without the high token costs of vision-based methods. Implement a feedback-enhanced protocol to refine LLM-generated inputs based on execution outcomes, potentially boosting bug detection rates to 64.5%. Consider human-in-the-loop refinement for complex scenarios.

Key insights

LLMs effectively generate mobile GUI text inputs for testing, with performance varying by UI context and improving with feedback.

Principles

UI context representation affects LLM testing efficacy and cost.
Iterative feedback loops enhance LLM-based test input generation.
Human oversight improves automated LLM testing outcomes.

Method

An empirical study evaluated nine LLMs using textual context, UI-hierarchy XML, and vision input, then applied a feedback-enhanced protocol incorporating execution outcomes to refine inputs for mobile GUI testing.

In practice

Prioritize textual context or UI-hierarchy XML for LLM prompting.
Implement feedback loops from execution results to refine LLM inputs.
Augment automated LLM testing with human review for critical inputs.

Topics

Large Language Models
Mobile GUI Testing
Text Input Generation
UI Context
Automated Software Testing
DroidBot

Best for: Research Scientist, AI Scientist, AI Engineer, Software Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.SE updates on arXiv.org.