PromptPrint: Behavioral Biometrics Through Natural Language Prompting in LLMs

2026-06-04 · Source: Computation and Language · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Cybersecurity & Data Privacy, Emerging Technologies & Innovation · Depth: Expert, quick

Summary

PromptPrint, a systematic study, investigates whether natural language prompts used in large language model (LLM) interactions contain a stable, author-identifiable signal for behavioral biometrics. Analyzing 20,680 real prompts from 1,034 users, the research establishes three key findings. First, lexical representations significantly outperform semantic encoders, supporting the "lexical stability hypothesis" that identity is primarily encoded in surface-level word choice. Second, stylometric features exhibit a "uniqueness-consistency paradox," showing users are highly distinctive across populations but behaviorally inconsistent across contexts. Third, adversarial analysis reveals identity signals are robust to minor lexical perturbations but degrade substantially under semantic paraphrasing. These results demonstrate strong identification performance at scale, establishing prompt-based identity as a viable behavioral biometric with implications for LLM user modeling, security, and privacy.

Key takeaway

For AI Security Engineers developing LLM applications, understanding prompt-based identity is crucial for user authentication and abuse detection. Your systems should prioritize lexical analysis over semantic understanding for identifying users, as identity signals are primarily encoded in surface-level word choice. Be aware that while robust to minor changes, these signals degrade significantly under semantic paraphrasing, necessitating robust adversarial defenses to protect against identity spoofing.

Key insights

User identity can be reliably inferred from LLM prompts based on surface-level lexical patterns.

Principles

Lexical stability hypothesis: identity is in word choice.
Stylometric features show uniqueness-consistency paradox.
Identity signals resist minor lexical changes.

Method

PromptPrint systematically studies prompt-based identity using real user prompts to analyze lexical, syntactic, and discourse patterns.

In practice

Prioritize lexical analysis for prompt-based identity.
Consider prompt length for identity consistency.
Guard against semantic paraphrasing attacks.

Topics

Behavioral Biometrics
LLM Prompting
Authorship Attribution
Natural Language Processing
User Modeling
AI Security

Best for: Research Scientist, CTO, VP of Engineering/Data, AI Scientist, NLP Engineer, AI Security Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Computation and Language.