PromptPrint: Behavioral Biometrics Through Natural Language Prompting in LLMs
Summary
PromptPrint, a systematic study, investigates whether natural language prompts used in large language model (LLM) interactions contain a stable, author-identifiable signal for behavioral biometrics. Analyzing 20,680 real prompts from 1,034 users, the research establishes three key findings. First, lexical representations significantly outperform semantic encoders, supporting the "lexical stability hypothesis" that identity is primarily encoded in surface-level word choice. Second, stylometric features exhibit a "uniqueness-consistency paradox," showing users are highly distinctive across populations but behaviorally inconsistent across contexts. Third, adversarial analysis reveals identity signals are robust to minor lexical perturbations but degrade substantially under semantic paraphrasing. These results demonstrate strong identification performance at scale, establishing prompt-based identity as a viable behavioral biometric with implications for LLM user modeling, security, and privacy.
Key takeaway
For AI Security Engineers developing LLM applications, understanding prompt-based identity is crucial for user authentication and abuse detection. Your systems should prioritize lexical analysis over semantic understanding for identifying users, as identity signals are primarily encoded in surface-level word choice. Be aware that while robust to minor changes, these signals degrade significantly under semantic paraphrasing, necessitating robust adversarial defenses to protect against identity spoofing.
Key insights
User identity can be reliably inferred from LLM prompts based on surface-level lexical patterns.
Principles
- Lexical stability hypothesis: identity is in word choice.
- Stylometric features show uniqueness-consistency paradox.
- Identity signals resist minor lexical changes.
Method
PromptPrint systematically studies prompt-based identity using real user prompts to analyze lexical, syntactic, and discourse patterns.
In practice
- Prioritize lexical analysis for prompt-based identity.
- Consider prompt length for identity consistency.
- Guard against semantic paraphrasing attacks.
Topics
- Behavioral Biometrics
- LLM Prompting
- Authorship Attribution
- Natural Language Processing
- User Modeling
- AI Security
Best for: Research Scientist, CTO, VP of Engineering/Data, AI Scientist, NLP Engineer, AI Security Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Computation and Language.