Understanding Automated Web GUI Testing: An Empirical Study Across Exploration Strategies and State Abstractions
Summary
An empirical study on automated web GUI testing (AWGT) analyzed the joint impact of exploration strategies and state abstractions on testing effectiveness, using code coverage and failure revelation as metrics. The study compared five AWGT approaches—Crawljax, FragGen, WebExplor, WebRLED, and GPTWeb—across model-based, reinforcement learning (RL)–based, and large language model (LLM)–based categories. It investigated six state abstraction techniques for model-based and RL-based approaches, and four history representations for LLM-based approaches, on six open-source web applications with a 30-minute time budget. Findings indicate that no single strategy consistently outperforms others, with RL-based approaches generally achieving the highest code coverage. Strict, fine-grained state abstractions benefit model-based strategies, while compact abstractions support RL-based ones. For LLM-based approaches, concise, functionality-level history representations proved most effective, with verbose state histories degrading performance and increasing costs by nearly 20x for unlimited context. The study also found no strong correlation between code coverage and failure-revealing ability.
Key takeaway
For AI Engineers designing or selecting automated web GUI testing (AWGT) solutions, understand that no single approach is universally superior. You should align your state abstraction technique with your chosen exploration strategy: strict, fine-grained abstractions for model-based systems, and compact ones for RL-based systems. When using LLM-based AWGT, prioritize concise, functionality-level history representations over verbose state descriptions to improve effectiveness and manage costs. Always evaluate your AWGT solutions using both code coverage and failure-revealing capabilities, as these metrics are not strongly correlated.
Key insights
The effectiveness of automated web GUI testing hinges on matching exploration strategies with appropriate state abstraction techniques.
Principles
- No single AWGT strategy consistently excels.
- State abstraction is critical for testing effectiveness.
- Code coverage does not correlate with failure revelation.
Method
An empirical study compared model-based, RL-based, and LLM-based AWGT approaches, integrating various state abstraction and history representation techniques, evaluating code coverage and failure revelation on web applications.
In practice
- Match state abstraction to exploration strategy.
- Prioritize functionality history for LLM-based AWGT.
- Evaluate AWGT using both coverage and failure metrics.
Topics
- Web GUI Testing
- Test Exploration Strategies
- GUI State Abstraction
- Reinforcement Learning
- Large Language Models
- Test Effectiveness Evaluation
Code references
- pagekit/pagekit
- jeka-kiselyov/dimeshift
- bigardone/phoenix-trello
- antoinejaussoin/retro-board
- tsubik/splittypie
Best for: Research Scientist, AI Scientist, AI Engineer, MLOps Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by cs.SE updates on arXiv.org.