Avenir-UX: Automated UX Evaluation via Simulated Human Web Interaction with GUI Grounding
Summary
OpenFlo is a new user-experience (UX) evaluation agent that automates web usability testing by simulating human interaction with websites. Built on the Avenir-Web framework (Li et al., 2026), OpenFlo uses GUI grounding to interact with real web pages, unlike traditional tools relying on DOM parsing. The system integrates simulated user behavior profiles with a structured evaluation protocol, including the System Usability Scale (SUS) (Brooke, 1996), step-wise Single Ease Questions (SEQ) (Sauro and Dumas, 2009), and concurrent Think Aloud reasoning. This approach generates comprehensive UX reports with quantitative scores and qualitative insights, identifying specific friction points. A case study on Recreation.gov yielded a SUS score of 55.0/100.0 (Grade D), while a Discogs task achieved an 87.5 SUS score (Grade A+), demonstrating its ability to handle complex web environments and provide actionable feedback.
Key takeaway
For research scientists developing web applications, OpenFlo offers a robust solution for integrating continuous, high-fidelity UX evaluation directly into the software development lifecycle. You should consider adopting OpenFlo to automate usability testing, enabling faster iteration and ensuring your products meet user needs by identifying specific friction points through its combined quantitative and qualitative reporting.
Key insights
OpenFlo automates web UX evaluation by simulating human interaction with GUI grounding and combining quantitative and qualitative metrics.
Principles
- Visual grounding improves web agent robustness.
- Combine quantitative and qualitative UX metrics.
- Automated UX evaluation accelerates development cycles.
Method
OpenFlo employs a multimodal grounding approach, combining DOM parsing with coordinate-based visual tagging. A central MLLM reasons about optimal next steps, while an adaptive memory and dynamic checklist maintain context. The evaluation pipeline includes Think Aloud, step-wise SEQ, and post-task SUS assessments.
In practice
- Use OpenFlo for continuous, scalable UX testing.
- Integrate SUS and SEQ for standardized usability metrics.
- Leverage Think Aloud for qualitative insights into user friction.
Topics
- OpenFlo
- Automated UX Evaluation
- GUI Grounding
- System Usability Scale
- Single Ease Question
Code references
Best for: Research Scientist, AI Scientist, AI Engineer, MLOps Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.