Reinforcing privacy reasoning in LLMs via normative simulacra from fiction
Summary
Researchers from Cornell Tech propose a novel method to enhance Large Language Models' (LLMs) contextual privacy reasoning by training them on "normative simulacra" extracted from fiction novels. This approach addresses the misalignment between LLM information handling and human privacy expectations, which existing methods often fail to resolve due to high inference costs or narrow task-specific fine-tuning. The team extracts structured representations of norms and information flows from 10 public-domain novels like "Pride and Prejudice" and "1984," creating machine-readable "normative universes." These simulacra are then used to fine-tune LLMs via supervised learning (SFT) followed by Group Relative Policy Optimization (GRPO). A composite reward function, including normative grounding and per-completion contrastive scoring, guides the GRPO phase. Evaluation across five Contextual Integrity (CI)-aligned benchmarks shows that GRPO with normative grounding achieves the highest scores on a law compliance task (GoldCoin-HIPAA) and the strongest correlation with human privacy expectations (ConfAIde Pearson r), demonstrating transferability to real-world domains.
Key takeaway
For research scientists developing privacy-aware AI agents, this work suggests a powerful new training paradigm. You should consider integrating richly-realized narrative texts, like fiction novels, to generate diverse normative training data. This method, combined with reinforcement learning techniques like GRPO and contrastive scoring, can imbue LLMs with a more robust, transferable understanding of contextual privacy, moving beyond mere compliance to genuine ethical reasoning.
Key insights
Fiction-derived normative simulacra and GRPO can teach LLMs transferable contextual privacy reasoning.
Principles
- Contextual Integrity defines privacy as appropriate information flow within context-relative norms.
- Norms are prescriptive expectations about obligations, prohibitions, permissions, or recommendations.
- Fiction provides rich, layered examples of normative reasoning in fully-realized social contexts.
Method
Extract normative simulacra (CI information flow tuples and Raz-anatomy norms) from fiction. Fine-tune LLMs using SFT, then GRPO with a composite reward function that includes normative grounding and per-completion contrastive scoring against incorrect normative universes.
In practice
- Use fiction to generate diverse, context-rich training data for privacy reasoning.
- Implement GRPO with programmatic and LLM-judged rewards for nuanced alignment.
- Employ contrastive scoring to prevent models from memorizing source-specific norms.
Topics
- Contextual Integrity
- Large Language Models
- Normative Simulacra
- Reinforcement Learning
- Supervised Fine-tuning
Code references
Best for: Research Scientist, AI Scientist, Machine Learning Engineer, AI Ethicist
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by cs.LG updates on arXiv.org.