Security and Privacy Prompts in the Wild: What Users Ask LLMs and How LLMs Respond
Summary
A study analyzed 14,727 security and privacy (S&P) prompts extracted from 3.2 million user-LLM conversations within the WildChat dataset. Researchers categorized these prompts into nine distinct areas, then performed a thematic analysis on 450 sampled prompts and evaluated 270 advice-seeking prompts. The findings indicate that commercial LLMs, such as GPT 5.5, significantly outperform open-weight models like Llama 4, providing "good enough" responses for 98% versus 47% of prompts, respectively. However, a critical discovery was that even high-quality commercial models sometimes produce contradictory responses across 10 repeated runs, risking confusing or misleading users seeking S&P guidance.
Key takeaway
For AI Security Engineers or NLP Engineers deploying LLMs for user-facing security and privacy assistance, you must account for response consistency. While commercial models like GPT 5.5 deliver higher quality S&P advice (98% good), their tendency to produce contradictory responses across runs (even for high-quality prompts) introduces significant risk. Implement robust validation mechanisms to detect and mitigate inconsistent or misleading S&P guidance from your LLM deployments.
Key insights
Commercial LLMs offer better security and privacy advice but exhibit concerning response inconsistency.
Principles
- Commercial LLMs generally outperform open-weight models for S&P queries.
- LLM response consistency is crucial for reliable advice.
Method
S&P prompts were sampled from WildChat, categorized, thematically analyzed, and evaluated for quality and consistency by posing prompts 10 times.
In practice
- Prioritize commercial LLMs for S&P advice, but implement consistency checks.
- Evaluate LLM outputs for contradictory guidance across multiple runs.
Topics
- Large Language Models
- Security and Privacy
- User Prompts
- LLM Response Quality
- Model Consistency
- WildChat Dataset
Best for: Research Scientist, CTO, VP of Engineering/Data, AI Scientist, AI Security Engineer, NLP Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.