How Much Can We Trust LLM Search Agents? Measuring Endorsement Vulnerability to Web Content Manipulation
Summary
A new evaluation framework, SearchGEO, measures the endorsement vulnerability of large language model (LLM)-based search agents to manipulated web content. This framework combines a web-evidence manipulation pipeline, a five-mode attack taxonomy, and multiple output-level metrics. Researchers evaluated 13 LLM backends across 308 cases each, revealing significant variations in attack success rates (ASR). Overall ASR ranged from 0.0% for Claude-Sonnet-4.6 to 31.4% for Gemini-3-Flash, with the strongest attack mode differing by model family. The study also found that deployment scaffolds could either amplify or decrease ASR on various backends. An auxiliary agent-skill probe, where endorsement implies an install command, exposed a sharp split: Claude over-rejects while GPT over-trusts. These findings underscore the necessity of treating recommendation reliability under adversarial search content as a critical dimension for backend safety evaluation.
Key takeaway
For AI Security Engineers deploying LLM search agents, you must prioritize evaluating their endorsement vulnerability to web content manipulation. Your safety assessments should include specific tests for attack success rates across different LLM backends, considering that models like Gemini-3-Flash show higher susceptibility than Claude-Sonnet-4.6. Implement auxiliary probes where endorsements become critical commands, as models exhibit distinct over-trust or over-rejection behaviors. This ensures your agents provide reliable recommendations and mitigate risks from adversarial web content.
Key insights
LLM search agents exhibit varied endorsement vulnerability to web content manipulation, requiring robust safety evaluations.
Principles
- Vulnerability patterns differ significantly across LLM backends.
- Deployment scaffolds can alter attack success rates.
- Recommendation reliability is a first-class safety dimension.
Method
SearchGEO framework combines a web-evidence manipulation pipeline, a five-mode attack taxonomy, and output-level metrics to measure endorsement corruption in LLM search agents.
In practice
- Evaluate LLM backends for endorsement vulnerability.
- Test agent behavior with install command probes.
- Assess scaffold impact on attack success rates.
Topics
- LLM Search Agents
- Web Content Manipulation
- Endorsement Vulnerability
- SearchGEO Framework
- AI Safety Evaluation
- Attack Success Rate
Best for: Research Scientist, CTO, VP of Engineering/Data, AI Security Engineer, AI Scientist, MLOps Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Computation and Language.