These LLMs are the best at resisting Russian propaganda
Summary
The Estonian Language Institute (ELI), in collaboration with Propastop, has launched a new "Propaganda Resistance" benchmark to evaluate large language models' ability to avoid promoting Russian strategic narratives. This benchmark identifies 14 categories of Russian influence operations, from Crimea's status to NATO's history. Models are tested with neutral, biased, and malicious prompts in English, Estonian, and Russian, with responses judged by an AI calibrated to Propastop experts. Anthropic's Claude models, particularly Opus 4.7, performed best among proprietary frontier models, achieving a 94.9 mean score with 77 percent "Exemplary" responses. OpenAI's GPT-5.4 scored 88.9. While newer models generally resist propaganda better than older ones, Google's Gemini 2.5 Pro scored 82, showing susceptibility to malicious prompts, and Gemini 3.5 Flash scored 73. Many models, including Gemini 3.5 Flash, also showed reduced resistance when prompted in Russian.
Key takeaway
For AI developers and product managers deploying LLMs in geopolitically sensitive regions, you must rigorously test your models for propaganda susceptibility. The Estonian Language Institute's benchmark highlights that performance varies significantly by model, prompt language, and malicious intent. Prioritize models like Anthropic's Claude Opus 4.7 for their robust resistance. Be aware that models like Google's Gemini 3.5 Flash show reduced resistance in non-English languages, posing a risk for international deployments.
Key insights
LLMs vary significantly in resisting state-sponsored propaganda, with newer models generally performing better.
Principles
- LLM propaganda resistance is language-dependent.
- Malicious prompts reduce LLM resistance.
- AI can be used to evaluate propaganda resistance.
Method
The benchmark uses neutral, biased, and malicious prompts across 14 categories, judged by an expert-calibrated AI.
In practice
- Evaluate LLMs for geopolitical narrative alignment.
- Test LLMs with prompts in target languages.
- Develop AI-driven content moderation tools.
Topics
- Large Language Models
- AI Benchmarking
- Propaganda Resistance
- Geopolitical Influence
- Model Safety
- Anthropic Claude
Best for: CTO, VP of Engineering/Data, Director of AI/ML, Tech Journalist, AI Scientist, Policy Maker
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by AI - Ars Technica.