NAMESAKES: Probing Identity Memorization in Text-to-Image Models

2026-06-19 · Source: cs.CL updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Cybersecurity & Data Privacy · Depth: Expert, extended

Summary

Researchers introduced a novel black-box behavioral probe to differentiate between identity memorization and fabrication in text-to-image (T2I) models when prompted with personal names. This method operates without requiring ground-truth photos, access to training data, or model internals, making it widely applicable. To benchmark this capability, the "Namesakes" dataset was developed, comprising 1,269 public figure names and faces, categorized by fame levels, alongside orthographically perturbed names. Experiments conducted on state-of-the-art T2I models, including SDXL-Base, SDXL-Turbo, Flux1-Dev, and Flux1-Schnell, demonstrated the probe's effectiveness. For SDXL-Base, it achieved an R^2 of 0.58 in predicting reference similarity and an AUC of 0.86 for distinguishing real from perturbed names. Other models showed R^2 values between 0.33 and 0.44, with AUCs exceeding 0.77. The probe utilizes two scores: dispersion and centroid similarity, offering insights into model behavior.

Key takeaway

For AI Security Engineers or Ethicists evaluating T2I model privacy, this black-box probe offers a crucial tool to detect identity memorization without needing ground-truth data or model internals. You can use its dispersion and centroid similarity scores to audit models for compliance or assess the efficacy of unlearning methods. Be mindful of the dataset's demographic skew, and consider human validation for critical applications, as statistical relationships are not guaranteed for single samples.

Key insights

A black-box probe can reliably detect identity memorization in T2I models without ground-truth data.

Principles

Memorized identities yield consistent generations.
Fabricated identities produce dispersed, generic faces.
Fame correlates with memorization likelihood.

Method

The probe uses two black-box scores: dispersion (δ) for inter-generation consistency and centroid similarity (s_cen) comparing a name's generations to others in the Namesakes dataset.

In practice

Audit T2I models for privacy compliance.
Evaluate identity-unlearning method effectiveness.
Assess demographic equity in memorization.

Topics

Text-to-Image Models
Identity Memorization
Black-box Probing
Namesakes Dataset
Privacy Auditing
Diffusion Models

Code references

Best for: Computer Vision Engineer, Research Scientist, AI Scientist, AI Security Engineer, AI Ethicist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CL updates on arXiv.org.