COMPASS: A Framework for Evaluating Organization-Specific Policy Alignment in LLMs
Summary
COMPASS (Company/Organization Policy Alignment Assessment) is a new framework designed to systematically evaluate large language models' adherence to organization-specific allowlist and denylist policies. Developed for high-stakes enterprise applications in sectors like healthcare and finance, COMPASS addresses a gap in existing safety evaluations, which primarily focus on universal harms. The framework was applied to eight diverse industry scenarios, generating 5,920 validated queries to test both routine compliance and adversarial robustness through edge cases. Evaluations of seven state-of-the-art LLMs revealed a significant asymmetry: models achieved over 95% accuracy on legitimate allowlist requests but failed catastrophically on denylist enforcement, refusing only 13-40% of adversarial violations. This indicates current LLMs lack the necessary robustness for policy-critical deployments.
Key takeaway
For CTOs and VPs of Engineering deploying LLMs in high-stakes enterprise applications, you must prioritize robust policy alignment evaluations. Current LLMs demonstrate critical weaknesses in enforcing denylist policies against adversarial inputs, even while handling legitimate requests. Your teams should integrate frameworks like COMPASS to rigorously test for these vulnerabilities before production deployment, mitigating significant operational and reputational risks.
Key insights
LLMs reliably handle allowed requests but fail to enforce prohibitions against adversarial denylist violations.
Principles
- Policy alignment is critical for enterprise LLM deployment.
- Adversarial robustness is distinct from routine compliance.
Method
COMPASS uses 5,920 validated queries across eight industry scenarios to test LLM compliance with allowlist and denylist policies, including adversarial edge cases.
In practice
- Test LLMs for denylist enforcement failures.
- Prioritize adversarial robustness in LLM evaluations.
Topics
- LLM Policy Alignment
- Organizational AI Safety
- Adversarial Robustness
- Denylist Evaluation
- Enterprise LLMs
Best for: CTO, VP of Engineering/Data, Director of AI/ML, AI Engineer, MLOps Engineer, AI Scientist
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Takara TLDR - Daily AI Papers.