mdok-style at SemEval-2026 Task 10: Finetuning LLMs for Conspiracy Detection
Summary
The mdok-style system, submitted to SemEval-2026 Task 10, achieved an 85th percentile ranking (8th out of 52 submissions) in conspiracy detection. This system specifically targets the identification of conspiracy beliefs within Reddit comments, framing it as a binary text-classification task. It finetunes the Qwen3-32B large language model, employing data augmentation and self-training techniques to address the challenge of limited training data. The approach, originally developed for machine-generated text detection, demonstrated its effectiveness and competitiveness in this new domain of conspiracy detection.
Key takeaway
For AI Engineers developing specialized text classifiers with limited datasets, consider applying data augmentation and self-training techniques to finetune large language models like Qwen3-32B. This approach can significantly improve performance, as demonstrated by its success in the SemEval-2026 Task 10, even when adapting methods from different domains.
Key insights
Data augmentation and self-training can effectively finetune LLMs for specialized text classification tasks with limited data.
Principles
- Small datasets benefit from augmentation
- Self-training enhances model performance
Method
Finetune Qwen3-32B using data augmentation and self-training for binary text classification, adapting a method from machine-generated text detection.
In practice
- Apply data augmentation for scarce data
- Utilize self-training for domain adaptation
Topics
- SemEval-2026 Task 10
- Conspiracy Detection
- Large Language Models
- Qwen3-32B
- Data Augmentation
Best for: AI Engineer, Research Scientist, AI Scientist, Machine Learning Engineer, NLP Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Computation and Language.