mdok-style at SemEval-2026 Task 10: Finetuning LLMs for Conspiracy Detection

2026-05-04 · Source: Computation and Language · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics · Depth: Advanced, quick

Summary

The mdok-style system, submitted to SemEval-2026 Task 10, achieved an 85th percentile ranking (8th out of 52 submissions) in conspiracy detection. This system specifically targets the identification of conspiracy beliefs within Reddit comments, framing it as a binary text-classification task. It finetunes the Qwen3-32B large language model, employing data augmentation and self-training techniques to address the challenge of limited training data. The approach, originally developed for machine-generated text detection, demonstrated its effectiveness and competitiveness in this new domain of conspiracy detection.

Key takeaway

For AI Engineers developing specialized text classifiers with limited datasets, consider applying data augmentation and self-training techniques to finetune large language models like Qwen3-32B. This approach can significantly improve performance, as demonstrated by its success in the SemEval-2026 Task 10, even when adapting methods from different domains.

Key insights

Data augmentation and self-training can effectively finetune LLMs for specialized text classification tasks with limited data.

Principles

Small datasets benefit from augmentation
Self-training enhances model performance

Method

Finetune Qwen3-32B using data augmentation and self-training for binary text classification, adapting a method from machine-generated text detection.

In practice

Apply data augmentation for scarce data
Utilize self-training for domain adaptation

Topics

SemEval-2026 Task 10
Conspiracy Detection
Large Language Models
Qwen3-32B
Data Augmentation

Best for: AI Engineer, Research Scientist, AI Scientist, Machine Learning Engineer, NLP Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Computation and Language.