LLM-based Detection of Manipulative Political Narratives
Summary
A new computational framework leverages Large Language Models (LLMs) to detect and structure manipulative political narratives within large, unfiltered social media datasets. The pipeline processes over 1.2 million posts from X, Reddit, and Telegram, with an 80% German and 20% English split, collected between January and February 2025. It employs a prompt-based filtering step using the Qwen3.5-122B-A10B-FP8 model to differentiate manipulative content from legitimate critique, achieving an F1 score of 0.77 with high recall (0.92). Subsequently, posts are embedded using Qwen3-Embedding-8B, dimensionality-reduced with UMAP, and clustered into 41 distinct narrative groups using HDBSCAN with a minimum cluster size of 400. Finally, the Qwen3.5-397B-A17B-FP8 model extracts detailed strategic narratives for each cluster, identifying themes like "The Great Replacement" and "The Proxy War."
Key takeaway
For NLP Engineers and Research Scientists working on disinformation detection, this framework offers a robust approach to identifying and structuring manipulative narratives in real-world social media data. You should consider integrating prompt-based LLM filtering with unsupervised clustering to move beyond traditional topic modeling and capture nuanced manipulative intent, especially when dealing with large, uncurated datasets. This can help you uncover emerging FIMI campaigns without relying on predefined categories.
Key insights
LLM-driven pipelines can effectively identify and cluster manipulative political narratives in raw social media data.
Principles
- Prompt-based filtering enhances FIMI detection.
- Intent-driven embeddings improve narrative clustering.
- Density-based clustering uncovers novel narrative groups.
Method
The method involves prompt-based LLM filtering, intent-driven embedding generation, UMAP dimensionality reduction, HDBSCAN clustering, and LLM-based narrative labeling to structure manipulative political narratives.
In practice
- Use Qwen3.5-122B-A10B-FP8 for initial filtering.
- Configure Qwen3-Embedding-8B for manipulative intent.
- Apply HDBSCAN with min_cluster_size=400 for clustering.
Topics
- LLM-based Narrative Detection
- Manipulative Political Narratives
- Foreign Information Manipulation and Interference
- Prompt Engineering
- Unsupervised Clustering
Code references
Best for: AI Scientist, NLP Engineer, Research Scientist
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CL updates on arXiv.org.