LLM-based Detection of Manipulative Political Narratives

2026-05-11 · Source: cs.CL updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics · Depth: Expert, extended

Summary

A new computational framework leverages Large Language Models (LLMs) to detect and structure manipulative political narratives within large, unfiltered social media datasets. The pipeline processes over 1.2 million posts from X, Reddit, and Telegram, with an 80% German and 20% English split, collected between January and February 2025. It employs a prompt-based filtering step using the Qwen3.5-122B-A10B-FP8 model to differentiate manipulative content from legitimate critique, achieving an F1 score of 0.77 with high recall (0.92). Subsequently, posts are embedded using Qwen3-Embedding-8B, dimensionality-reduced with UMAP, and clustered into 41 distinct narrative groups using HDBSCAN with a minimum cluster size of 400. Finally, the Qwen3.5-397B-A17B-FP8 model extracts detailed strategic narratives for each cluster, identifying themes like "The Great Replacement" and "The Proxy War."

Key takeaway

For NLP Engineers and Research Scientists working on disinformation detection, this framework offers a robust approach to identifying and structuring manipulative narratives in real-world social media data. You should consider integrating prompt-based LLM filtering with unsupervised clustering to move beyond traditional topic modeling and capture nuanced manipulative intent, especially when dealing with large, uncurated datasets. This can help you uncover emerging FIMI campaigns without relying on predefined categories.

Key insights

LLM-driven pipelines can effectively identify and cluster manipulative political narratives in raw social media data.

Principles

Prompt-based filtering enhances FIMI detection.
Intent-driven embeddings improve narrative clustering.
Density-based clustering uncovers novel narrative groups.

Method

The method involves prompt-based LLM filtering, intent-driven embedding generation, UMAP dimensionality reduction, HDBSCAN clustering, and LLM-based narrative labeling to structure manipulative political narratives.

In practice

Use Qwen3.5-122B-A10B-FP8 for initial filtering.
Configure Qwen3-Embedding-8B for manipulative intent.
Apply HDBSCAN with min_cluster_size=400 for clustering.

Topics

LLM-based Narrative Detection
Manipulative Political Narratives
Foreign Information Manipulation and Interference
Prompt Engineering
Unsupervised Clustering

Code references

SinclairSchneider/manipulative_narrative_detection

Best for: AI Scientist, NLP Engineer, Research Scientist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CL updates on arXiv.org.