CATCH-ME if you RAG: a dataset of Contextually Annotated multi-Turn Counterspeech against Hate and Misinformation Exchanges

2026-06-18 · Source: Computation and Language · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics · Depth: Expert, quick

Summary

CATCH-ME is a novel, large-scale, expert-curated, multilingual dataset designed to address the intersection of online hate speech and misinformation, an area often treated in isolation by NLP research. This resource bridges a critical gap, as existing counterspeech datasets are scarce, limited to single-turn English dialogues, and lack the multi-turn, multilingual complexity of real-life interactions. CATCH-ME's dialogues are anchored in verified external knowledge, such as fact-checking articles and NGO reports, and feature document- and chunk-level span annotations, making it directly applicable for Retrieval Augmented Generation (RAG) systems. Covering five languages and targeting hate directed at seven marginalized groups, this dataset enables the training and evaluation of more persuasive and factually grounded counterspeech models, improving LLM assistance for human counterspeech generation.

Key takeaway

For NLP engineers developing or evaluating counterspeech models, CATCH-ME offers a crucial resource to overcome limitations of existing datasets. You can leverage its multi-turn, multilingual, and factually grounded dialogues to train more persuasive LLMs against hate and misinformation. Consider integrating this dataset into your RAG-based systems to enhance model accuracy and reduce vague or repetitive responses in real-world applications.

Key insights

The CATCH-ME dataset provides expert-curated, multilingual, multi-turn dialogues against hate and misinformation, grounded in facts for RAG systems.

Principles

NLP research often isolates hate speech and misinformation.
LLMs need high-quality, multi-turn examples for effective counterspeech.
Factual grounding enhances counterspeech persuasiveness.

Method

The dataset creation involves expert curation of multi-turn dialogues, anchoring them in verified external knowledge like fact-checking articles and NGO reports, and adding document- and chunk-level span annotations.

In practice

Train LLMs for multi-turn, multilingual counterspeech.
Evaluate counterspeech models on factual grounding.
Develop RAG systems for hate and misinformation.

Topics

CATCH-ME Dataset
Counterspeech Generation
Hate Speech
Misinformation
Retrieval-Augmented Generation
Multilingual NLP

Best for: Research Scientist, AI Scientist, Machine Learning Engineer, NLP Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Computation and Language.