POLAR-Bench: A Diagnostic Benchmark for Privacy-Utility Trade-offs in LLM Agents
Summary
POLAR-Bench is a new diagnostic benchmark designed to evaluate the privacy-utility trade-offs in LLM agents. It features a trusted model interacting with an adversarial third-party model across 10 diverse domains and 7,852 samples. The benchmark deterministically scores privacy and utility by varying privacy policy dimensions and attack strategies along two orthogonal axes, creating a $5\times 5$ diagnostic surface for each model. Key findings reveal a significant performance gap: current frontier models like GLM-5.1, GPT-5.4, Gemma-4-31B, and DeepSeek-V3.1 withhold over 99% of protected attributes, maintaining high utility. In contrast, smaller open-weight models in the 1–30B parameter range, often deployed on-device, score notably worse, with some leaking over half of private data. The research indicates that alignment and training choices are more critical than model size for balancing privacy and utility.
Key takeaway
For Machine Learning Engineers deploying LLM agents that handle private user data, you must recognize that smaller, locally-deployable models (1-30B parameters) exhibit substantial privacy leakage under adversarial conditions, often exceeding 50%. Do not solely rely on model size; instead, prioritize rigorous privacy alignment and training choices. You should stress-test your chosen models against diverse attack strategies and privacy policy complexities using diagnostic benchmarks like POLAR-Bench to ensure robust privacy-utility balance before deployment.
Key insights
POLAR-Bench diagnoses LLM agent privacy-utility trade-offs, showing frontier models significantly outperform smaller, on-device counterparts.
Principles
- Privacy-utility trade-offs are not fundamental; alignment matters.
- Model size alone is not a reliable privacy predictor.
- Incremental attacks are more privacy-threatening than direct ones.
Method
POLAR-Bench uses a two-agent setup with a trusted model and an adversarial probe. It scores privacy and utility deterministically via regex-validated documents across 10 domains, varying policy dimensions and attack strategies.
In practice
- Stress-test inference-time defenses like PrivacyChecker.
- Evaluate models on $5\times 5$ policy x attack surface.
- Prioritize alignment and training for privacy-utility balance.
Topics
- LLM Agents
- Privacy-Utility Trade-off
- POLAR-Bench
- Adversarial Attacks
- Privacy Alignment
- Model Evaluation
Code references
Best for: Research Scientist, CTO, VP of Engineering/Data, AI Scientist, AI Security Engineer, Machine Learning Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.