POLAR-Bench: A Diagnostic Benchmark for Privacy-Utility Trade-offs in LLM Agents

2026-05-06 · Source: cs.AI updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Cybersecurity & Data Privacy, Robotics & Autonomous Systems · Depth: Expert, extended

Summary

POLAR-Bench is a new diagnostic benchmark designed to evaluate the privacy-utility trade-offs in LLM agents. It features a trusted model interacting with an adversarial third-party model across 10 diverse domains and 7,852 samples. The benchmark deterministically scores privacy and utility by varying privacy policy dimensions and attack strategies along two orthogonal axes, creating a $5\times 5$ diagnostic surface for each model. Key findings reveal a significant performance gap: current frontier models like GLM-5.1, GPT-5.4, Gemma-4-31B, and DeepSeek-V3.1 withhold over 99% of protected attributes, maintaining high utility. In contrast, smaller open-weight models in the 1–30B parameter range, often deployed on-device, score notably worse, with some leaking over half of private data. The research indicates that alignment and training choices are more critical than model size for balancing privacy and utility.

Key takeaway

For Machine Learning Engineers deploying LLM agents that handle private user data, you must recognize that smaller, locally-deployable models (1-30B parameters) exhibit substantial privacy leakage under adversarial conditions, often exceeding 50%. Do not solely rely on model size; instead, prioritize rigorous privacy alignment and training choices. You should stress-test your chosen models against diverse attack strategies and privacy policy complexities using diagnostic benchmarks like POLAR-Bench to ensure robust privacy-utility balance before deployment.

Key insights

POLAR-Bench diagnoses LLM agent privacy-utility trade-offs, showing frontier models significantly outperform smaller, on-device counterparts.

Principles

Privacy-utility trade-offs are not fundamental; alignment matters.
Model size alone is not a reliable privacy predictor.
Incremental attacks are more privacy-threatening than direct ones.

Method

POLAR-Bench uses a two-agent setup with a trusted model and an adversarial probe. It scores privacy and utility deterministically via regex-validated documents across 10 domains, varying policy dimensions and attack strategies.

In practice

Stress-test inference-time defenses like PrivacyChecker.
Evaluate models on $5\times 5$ policy x attack surface.
Prioritize alignment and training for privacy-utility balance.

Topics

LLM Agents
Privacy-Utility Trade-off
POLAR-Bench
Adversarial Attacks
Privacy Alignment
Model Evaluation

Code references

deepseek-ai/DeepSeek-R1

Best for: Research Scientist, CTO, VP of Engineering/Data, AI Scientist, AI Security Engineer, Machine Learning Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.