PerceptUI: LLM Agents as Human-Aligned Synthetic Users for UI/UX Evaluation
Summary
PerceptUI is a novel framework developed by Woven by Toyota for persona-conditioned UI/UX evaluation, designed to predict how specific users respond to interface questions and generate natural-language rationales. The framework employs a two-stage training process: contrastive reflection fine-tuning distills teacher-generated rationales by extracting lessons from human decisions, followed by a reflective prompt-evolution step based on the model's own failure traces. Instantiated with Qwen-VL, PerceptUI achieves human-level realism, generalizes to unseen questions and personas, and produces calibrated population-level answer distributions. Evaluated across diverse datasets including WiserUI-Bench, UIClip/BetterApp, and a proprietary UXCar survey, PerceptUI consistently outperforms baselines in design selection, quality prediction, rationale quality, and persona-conditioned UI/UX rating, demonstrating its effectiveness in both personalized and non-personalized scenarios while offering a more cost-effective evaluation approach.
Key takeaway
For UI/UX designers and product teams evaluating early-stage interfaces, PerceptUI offers a powerful tool to simulate persona-conditioned user responses. You can use this framework to quickly identify potential usability issues and compare design alternatives. It also helps estimate population-level preferences without the high cost of extensive human studies. Remember, however, that synthetic evaluations complement human validation. Do not replace real user studies, especially for interactive experiences or to mitigate inherent model biases.
Key insights
PerceptUI leverages LLM agents to simulate persona-conditioned UI/UX responses and rationales, achieving human-level realism and generalization.
Principles
- UI judgments are highly user-dependent.
- Contrastive rationales enhance model reasoning.
- Prompt evolution adapts to task-specific errors.
Method
PerceptUI trains a vision-language model via contrastive reflection fine-tuning using teacher-generated rationales, followed by reflective prompt evolution to optimize the inference prompt from model failure traces.
In practice
- Simulate user responses for early design screening.
- Estimate population-level UI response distributions.
- Generate persona-aware design critiques.
Topics
- PerceptUI Framework
- LLM Agents
- UI/UX Evaluation
- Persona-Conditioned AI
- Multimodal LLMs
- Prompt Optimization
Best for: Research Scientist, AI Scientist, Machine Learning Engineer, AI Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.