Understanding Automated Web GUI Testing: An Empirical Study Across Exploration Strategies and State Abstractions

· Source: cs.SE updates on arXiv.org · Field: Technology & Digital — Software Development & Engineering, Artificial Intelligence & Machine Learning · Depth: Expert, extended

Summary

An empirical study on automated web GUI testing (AWGT) analyzed the joint impact of exploration strategies and state abstractions on testing effectiveness, using code coverage and failure revelation as metrics. The study compared five AWGT approaches—Crawljax, FragGen, WebExplor, WebRLED, and GPTWeb—across model-based, reinforcement learning (RL)–based, and large language model (LLM)–based categories. It investigated six state abstraction techniques for model-based and RL-based approaches, and four history representations for LLM-based approaches, on six open-source web applications with a 30-minute time budget. Findings indicate that no single strategy consistently outperforms others, with RL-based approaches generally achieving the highest code coverage. Strict, fine-grained state abstractions benefit model-based strategies, while compact abstractions support RL-based ones. For LLM-based approaches, concise, functionality-level history representations proved most effective, with verbose state histories degrading performance and increasing costs by nearly 20x for unlimited context. The study also found no strong correlation between code coverage and failure-revealing ability.

Key takeaway

For AI Engineers designing or selecting automated web GUI testing (AWGT) solutions, understand that no single approach is universally superior. You should align your state abstraction technique with your chosen exploration strategy: strict, fine-grained abstractions for model-based systems, and compact ones for RL-based systems. When using LLM-based AWGT, prioritize concise, functionality-level history representations over verbose state descriptions to improve effectiveness and manage costs. Always evaluate your AWGT solutions using both code coverage and failure-revealing capabilities, as these metrics are not strongly correlated.

Key insights

The effectiveness of automated web GUI testing hinges on matching exploration strategies with appropriate state abstraction techniques.

Principles

Method

An empirical study compared model-based, RL-based, and LLM-based AWGT approaches, integrating various state abstraction and history representation techniques, evaluating code coverage and failure revelation on web applications.

In practice

Topics

Code references

Best for: Research Scientist, AI Scientist, AI Engineer, MLOps Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.SE updates on arXiv.org.