LLMs Can Leak Training Data But Do They Want To? A Propensity-Aware Evaluation of Memorization in LLMs
Summary
The PropMe framework and SimpleTrace pipeline introduce a propensity-aware evaluation for Large Language Model (LLM) memorization, contrasting prefix-based capability attacks with non-adversarial use. PropMe proposes a metric transformation for existing functions, while SimpleTrace, built on infini-gram, deterministically attributes generations to training corpora and computes verbatim, near-verbatim, and propensity-transformed metrics. Evaluating Comma and DFM Decoder models on Common Pile and Dynaword datasets, researchers found a consistent gap: prefix attacks elicited substantially stronger memorization signals than generic or dataset-specific prompts, with overall low propensity scores. DFM Decoder, continually pre-trained from Comma, exhibited reduced memorization and propensity for Common Pile, suggesting memorization capability can decrease with training on partially different data.
Key takeaway
For AI scientists and ML engineers developing or deploying LLMs, you should integrate both capability and propensity evaluations into your memorization audits. Relying solely on worst-case extractability overstates practical leakage risks, while ignoring it misses critical vulnerabilities. Your audits should report both to provide a comprehensive view of data leakage, especially for legal compliance under regulations like GDPR and the EU AI Act.
Key insights
LLMs can leak training data under adversarial prompts but rarely do so in ordinary, non-adversarial use.
Principles
- Distinguish LLM capability from propensity in evaluations.
- Continual pre-training can reduce prior data memorization.
- Propensity metrics require both adversarial and non-adversarial context.
Method
PropMe contrasts generic/specific prompts (propensity) with prefix attacks (capability), applying a transformation to standard metrics. SimpleTrace traces model outputs to training data using infini-gram for deterministic attribution.
In practice
- Implement PropMe to audit LLM data leakage risks.
- Use SimpleTrace for deterministic training data attribution.
- Assess models for GDPR and EU AI Act compliance.
Topics
- LLM Memorization
- Data Leakage
- Propensity Evaluation
- SimpleTrace
- infini-gram
- Continual Pre-training
Code references
Best for: Research Scientist, CTO, VP of Engineering/Data, AI Scientist, Machine Learning Engineer, AI Ethicist
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.