These LLMs are the best at resisting Russian propaganda

2026-06-04 · Source: AI - Ars Technica · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Emerging Technologies & Innovation · Depth: Intermediate, short

Summary

The Estonian Language Institute (ELI), in collaboration with Propastop, has launched a new "Propaganda Resistance" benchmark to evaluate large language models' ability to avoid promoting Russian strategic narratives. This benchmark identifies 14 categories of Russian influence operations, from Crimea's status to NATO's history. Models are tested with neutral, biased, and malicious prompts in English, Estonian, and Russian, with responses judged by an AI calibrated to Propastop experts. Anthropic's Claude models, particularly Opus 4.7, performed best among proprietary frontier models, achieving a 94.9 mean score with 77 percent "Exemplary" responses. OpenAI's GPT-5.4 scored 88.9. While newer models generally resist propaganda better than older ones, Google's Gemini 2.5 Pro scored 82, showing susceptibility to malicious prompts, and Gemini 3.5 Flash scored 73. Many models, including Gemini 3.5 Flash, also showed reduced resistance when prompted in Russian.

Key takeaway

For AI developers and product managers deploying LLMs in geopolitically sensitive regions, you must rigorously test your models for propaganda susceptibility. The Estonian Language Institute's benchmark highlights that performance varies significantly by model, prompt language, and malicious intent. Prioritize models like Anthropic's Claude Opus 4.7 for their robust resistance. Be aware that models like Google's Gemini 3.5 Flash show reduced resistance in non-English languages, posing a risk for international deployments.

Key insights

LLMs vary significantly in resisting state-sponsored propaganda, with newer models generally performing better.

Principles

LLM propaganda resistance is language-dependent.
Malicious prompts reduce LLM resistance.
AI can be used to evaluate propaganda resistance.

Method

The benchmark uses neutral, biased, and malicious prompts across 14 categories, judged by an expert-calibrated AI.

In practice

Evaluate LLMs for geopolitical narrative alignment.
Test LLMs with prompts in target languages.
Develop AI-driven content moderation tools.

Topics

Large Language Models
AI Benchmarking
Propaganda Resistance
Geopolitical Influence
Model Safety
Anthropic Claude

Best for: CTO, VP of Engineering/Data, Director of AI/ML, Tech Journalist, AI Scientist, Policy Maker

Related on AIssential

See Counsel's argued verdicts on the open AI decisions leaders are weighing →

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by AI - Ars Technica.