mdok-style at SemEval-2026 Task 9: Finetuning LLMs for Multilingual Polarization Detection
Summary
The mdok-style team participated in SemEval-2026 Task 9, which focuses on multilingual polarization detection across 22 languages and multiple cultural and event contexts. This task aims to identify online polarization along three axes: detection, type, and manifestation, to prevent escalation into hate speech and social fragmentation. The team addressed this challenge by finetuning mid-sized Large Language Models (LLMs) for sequence classification. They employed the QLoRA parameter-efficient finetuning technique and augmented the multilingual training data with anonymized, lower-cased, upper-cased, and homoglyphied versions to enhance detection robustness.
Key takeaway
For research scientists developing online content moderation systems, understanding the effectiveness of QLoRA with augmented multilingual data for polarization detection is critical. You should consider integrating similar data augmentation strategies and parameter-efficient finetuning techniques to improve the robustness and scalability of your models in diverse linguistic environments.
Key insights
Finetuning mid-sized LLMs with QLoRA and augmented data improves multilingual online polarization detection.
Principles
- Early polarization detection prevents online harm.
- Data augmentation enhances model robustness.
Method
Finetune mid-size LLMs for sequence classification using QLoRA, augmenting multilingual training data with anonymized, cased, and homoglyphied versions.
In practice
- Use QLoRA for efficient LLM finetuning.
- Augment text data with casing and homoglyphs.
Topics
- SemEval-2026 Task 9
- Multilingual Polarization Detection
- Large Language Models
- QLoRA Finetuning
- Sequence Classification
Best for: Research Scientist, AI Scientist, Machine Learning Engineer, NLP Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Computation and Language.