Reinforcing privacy reasoning in LLMs via normative simulacra from fiction

2026-04-24 · Source: cs.LG updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Cybersecurity & Data Privacy, Robotics & Autonomous Systems · Depth: Advanced, extended

Summary

Researchers from Cornell Tech propose a novel method to enhance Large Language Models' (LLMs) contextual privacy reasoning by training them on "normative simulacra" extracted from fiction novels. This approach addresses the misalignment between LLM information handling and human privacy expectations, which existing methods often fail to resolve due to high inference costs or narrow task-specific fine-tuning. The team extracts structured representations of norms and information flows from 10 public-domain novels like "Pride and Prejudice" and "1984," creating machine-readable "normative universes." These simulacra are then used to fine-tune LLMs via supervised learning (SFT) followed by Group Relative Policy Optimization (GRPO). A composite reward function, including normative grounding and per-completion contrastive scoring, guides the GRPO phase. Evaluation across five Contextual Integrity (CI)-aligned benchmarks shows that GRPO with normative grounding achieves the highest scores on a law compliance task (GoldCoin-HIPAA) and the strongest correlation with human privacy expectations (ConfAIde Pearson r), demonstrating transferability to real-world domains.

Key takeaway

For research scientists developing privacy-aware AI agents, this work suggests a powerful new training paradigm. You should consider integrating richly-realized narrative texts, like fiction novels, to generate diverse normative training data. This method, combined with reinforcement learning techniques like GRPO and contrastive scoring, can imbue LLMs with a more robust, transferable understanding of contextual privacy, moving beyond mere compliance to genuine ethical reasoning.

Key insights

Fiction-derived normative simulacra and GRPO can teach LLMs transferable contextual privacy reasoning.

Principles

Contextual Integrity defines privacy as appropriate information flow within context-relative norms.
Norms are prescriptive expectations about obligations, prohibitions, permissions, or recommendations.
Fiction provides rich, layered examples of normative reasoning in fully-realized social contexts.

Method

Extract normative simulacra (CI information flow tuples and Raz-anatomy norms) from fiction. Fine-tune LLMs using SFT, then GRPO with a composite reward function that includes normative grounding and per-completion contrastive scoring against incorrect normative universes.

In practice

Use fiction to generate diverse, context-rich training data for privacy reasoning.
Implement GRPO with programmatic and LLM-judged rewards for nuanced alignment.
Employ contrastive scoring to prevent models from memorizing source-specific norms.

Topics

Contextual Integrity
Large Language Models
Normative Simulacra
Reinforcement Learning
Supervised Fine-tuning

Code references

huggingface/trl

Best for: Research Scientist, AI Scientist, Machine Learning Engineer, AI Ethicist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.LG updates on arXiv.org.