PerceptUI: LLM Agents as Human-Aligned Synthetic Users for UI/UX Evaluation

· Source: cs.AI updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics · Depth: Expert, extended

Summary

PerceptUI is a novel framework developed by Woven by Toyota for persona-conditioned UI/UX evaluation, designed to predict how specific users respond to interface questions and generate natural-language rationales. The framework employs a two-stage training process: contrastive reflection fine-tuning distills teacher-generated rationales by extracting lessons from human decisions, followed by a reflective prompt-evolution step based on the model's own failure traces. Instantiated with Qwen-VL, PerceptUI achieves human-level realism, generalizes to unseen questions and personas, and produces calibrated population-level answer distributions. Evaluated across diverse datasets including WiserUI-Bench, UIClip/BetterApp, and a proprietary UXCar survey, PerceptUI consistently outperforms baselines in design selection, quality prediction, rationale quality, and persona-conditioned UI/UX rating, demonstrating its effectiveness in both personalized and non-personalized scenarios while offering a more cost-effective evaluation approach.

Key takeaway

For UI/UX designers and product teams evaluating early-stage interfaces, PerceptUI offers a powerful tool to simulate persona-conditioned user responses. You can use this framework to quickly identify potential usability issues and compare design alternatives. It also helps estimate population-level preferences without the high cost of extensive human studies. Remember, however, that synthetic evaluations complement human validation. Do not replace real user studies, especially for interactive experiences or to mitigate inherent model biases.

Key insights

PerceptUI leverages LLM agents to simulate persona-conditioned UI/UX responses and rationales, achieving human-level realism and generalization.

Principles

Method

PerceptUI trains a vision-language model via contrastive reflection fine-tuning using teacher-generated rationales, followed by reflective prompt evolution to optimize the inference prompt from model failure traces.

In practice

Topics

Best for: Research Scientist, AI Scientist, Machine Learning Engineer, AI Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.