CAPED: Context-Aware Privacy Exposure Defense for Mobile GUI Agents

2026-05-14 · Source: cs.AI updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Cybersecurity & Data Privacy, Robotics & Autonomous Systems · Depth: Expert, extended

Summary

CAPED (Context-Aware Privacy Exposure Defense) is a phone-side pre-upload protection layer designed for mobile GUI agents to mitigate "incidental visual privacy exposure." This system extracts task requirements, uses screen context as a privacy prior, parses visible UI elements, and selectively exposes only content essential for the current task, masking incidental private data before screenshots are sent to a remote multimodal agent. Evaluated on AndroidWorld for broad task utility and a controlled 28-task seeded privacy evaluation, Full CAPED reduced success-conditioned weighted seeded leakage (WSLR) from 0.766 under raw screenshots to 0.268, while maintaining high task utility (0.929). A broader AndroidWorld run showed a prototype-level utility cost, completing 64 of 116 tasks (55.2%) compared to 77 tasks (66.4%) for the unprotected baseline. The results emphasize treating screenshot upload as an explicit device–cloud boundary decision, governed by task-driven selective exposure.

Key takeaway

For AI Engineers developing mobile GUI agents, you should integrate phone-side pre-upload privacy controls like CAPED to prevent incidental visual data leakage. This approach ensures task-relevant content is exposed while masking sensitive, task-irrelevant information, significantly reducing privacy risks. Consider implementing local task interpretation and context-aware selective exposure to balance utility and user privacy effectively, treating each screenshot upload as a critical device-cloud boundary decision.

Key insights

Mobile GUI agents require task-driven selective exposure to prevent incidental visual privacy leakage at the device-cloud boundary.

Principles

Protect before upload on the phone side.
Preserve task utility through element granularity.
Use screen context as a privacy prior.

Method

CAPED extracts task requirements locally, classifies screen context, parses UI elements, and resolves exposure decisions based on task relevance and element modality, then redacts screenshots.

In practice

Implement local task requirement extraction.
Apply context-aware default privacy postures.
Use modality-specific verification for elements.

Topics

Mobile GUI Agents
Visual Privacy Exposure
Context-Aware Redaction
Device-Cloud Security
AndroidWorld Benchmark
Multimodal Models

Best for: Research Scientist, AI Scientist, AI Security Engineer, AI Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.