KnowU-Bench: Towards Interactive, Proactive, and Personalized Mobile Agent Evaluation

2026-04-09 · Source: Artificial Intelligence · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Robotics & Autonomous Systems · Depth: Expert, quick

Summary

KnowU-Bench is a new online benchmark designed to evaluate personalized mobile agents in a reproducible Android emulation environment. It addresses limitations of prior benchmarks by focusing on interactive preference elicitation, proactive assistance, and consent handling, rather than static preference recovery or fixed intent prediction. The benchmark includes 42 general GUI tasks, 86 personalized tasks, and 64 proactive tasks. Crucially, KnowU-Bench hides user profiles from agents, requiring them to infer preferences from behavioral logs and engage in multi-turn clarification dialogues via an LLM-driven user simulator. It evaluates the complete proactive decision chain, from GUI execution to consent negotiation and post-rejection restraint, using a hybrid rule-based and LLM-as-a-Judge scoring protocol. Initial experiments show that even frontier models like Claude Sonnet 4.6 perform below 50% on tasks requiring preference inference or intervention calibration, highlighting a significant gap in current agent capabilities.

Key takeaway

For research scientists developing personalized mobile agents, you should prioritize building systems capable of genuine preference inference through interactive dialogue and robust proactive decision-making. Your evaluation metrics must extend beyond GUI navigation to include multi-turn preference elicitation, consent negotiation, and appropriate restraint after rejection, as current frontier models demonstrate significant weaknesses in these areas. This shift is critical for developing trustworthy and effective personal assistants.

Key insights

Evaluating personalized mobile agents requires dynamic preference inference and proactive interaction, not just static context lookup.

Principles

User profiles should be hidden for genuine preference inference.
Proactive agents need to negotiate consent and respect rejections.

Method

KnowU-Bench uses an Android emulation, an LLM-driven user simulator for multi-turn elicitation, and a hybrid rule-based/LLM-as-a-Judge protocol to evaluate personalized and proactive mobile agents.

In practice

Test agents on dynamic preference acquisition.
Implement consent negotiation in proactive systems.
Focus on intervention calibration for agents.

Topics

KnowU-Bench
Mobile Agent Evaluation
Personalized Agents
Proactive Assistance
LLM-driven User Simulator

Best for: Research Scientist, AI Scientist, Machine Learning Engineer, AI Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.