NAMESAKES: Probing Identity Memorization in Text-to-Image Models
Summary
Researchers introduced a novel black-box behavioral probe to differentiate between identity memorization and fabrication in text-to-image (T2I) models when prompted with personal names. This method operates without requiring ground-truth photos, access to training data, or model internals, making it widely applicable. To benchmark this capability, the "Namesakes" dataset was developed, comprising 1,269 public figure names and faces, categorized by fame levels, alongside orthographically perturbed names. Experiments conducted on state-of-the-art T2I models, including SDXL-Base, SDXL-Turbo, Flux1-Dev, and Flux1-Schnell, demonstrated the probe's effectiveness. For SDXL-Base, it achieved an R^2 of 0.58 in predicting reference similarity and an AUC of 0.86 for distinguishing real from perturbed names. Other models showed R^2 values between 0.33 and 0.44, with AUCs exceeding 0.77. The probe utilizes two scores: dispersion and centroid similarity, offering insights into model behavior.
Key takeaway
For AI Security Engineers or Ethicists evaluating T2I model privacy, this black-box probe offers a crucial tool to detect identity memorization without needing ground-truth data or model internals. You can use its dispersion and centroid similarity scores to audit models for compliance or assess the efficacy of unlearning methods. Be mindful of the dataset's demographic skew, and consider human validation for critical applications, as statistical relationships are not guaranteed for single samples.
Key insights
A black-box probe can reliably detect identity memorization in T2I models without ground-truth data.
Principles
- Memorized identities yield consistent generations.
- Fabricated identities produce dispersed, generic faces.
- Fame correlates with memorization likelihood.
Method
The probe uses two black-box scores: dispersion (δ) for inter-generation consistency and centroid similarity (s_cen) comparing a name's generations to others in the Namesakes dataset.
In practice
- Audit T2I models for privacy compliance.
- Evaluate identity-unlearning method effectiveness.
- Assess demographic equity in memorization.
Topics
- Text-to-Image Models
- Identity Memorization
- Black-box Probing
- Namesakes Dataset
- Privacy Auditing
- Diffusion Models
Code references
Best for: Computer Vision Engineer, Research Scientist, AI Scientist, AI Security Engineer, AI Ethicist
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CL updates on arXiv.org.