One Polluted Page Is Enough: Evaluating Web Content Pollution in Generative Recommenders

· Source: cs.AI updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics, Cybersecurity & Data Privacy · Depth: Expert, extended

Summary

The FORGE (Fake Online Recommendations in Generative Environments) benchmark evaluates how search-augmented large language models (LLMs) promote fake products due to web content pollution. This benchmark simulates pollution by locally rewriting real products in retrieved web pages into fake ones across 225 products, 15 categories, and 5 consumer scenarios. Findings reveal all 12 commercial and open-weights LLMs are vulnerable; a single polluted page fools models up to 27%, while replacing the top-3 results increases this to 73.8%. Vulnerability correlates with models' lack of prior product knowledge. Reasoning often exacerbates the issue by generating spurious social proof. Evaluated defenses like skepticism prompting can backfire, increasing fooled rates by up to 44 pp on Gemini 3.1 Pro, while consensus filtering risks suppressing 52%-79% of legitimate recommendations.

Key takeaway

For machine learning engineers deploying search-augmented LLM recommenders, recognize that current models are highly susceptible to web content pollution, even from minimal sources. Relying on skepticism prompts or post-hoc consensus filtering is insufficient and can even worsen vulnerability. Instead, prioritize implementing robust retrieval-time defenses such as source-credibility weighting, evidence diversification, and strong cross-document corroboration to build pollution-resilient systems.

Key insights

Generative recommenders are highly vulnerable to web content pollution, even from a single source, with current defenses largely ineffective.

Principles

Method

FORGE locally rewrites real product mentions in retrieved web pages into fake ones, then measures how often LLMs recommend the fake product.

In practice

Topics

Code references

Best for: AI Architect, Research Scientist, CTO, AI Scientist, AI Security Engineer, Machine Learning Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.