RacismoBR: A Manually Annotated Dataset for Racist Discourse Detection in Brazilian Portuguese

· Source: Paper Index on ACL Anthology · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics · Depth: Expert, medium

Summary

RacismoBR is a new, culturally grounded dataset designed to detect racist discourse in Brazilian Portuguese social media, addressing the challenge of identifying both explicit and subtle forms of racism. Manually annotated exclusively by Black researchers to ensure sociolinguistic validity, the dataset was used to evaluate various classification models, including classical machine learning, supervised Transformer-based (Small) Language Models, and Large Language Models like GPT-4.1 under in-context, few-shot learning. While GPT-4.1 and BERTimbau achieved the highest Macro-F1 scores, Wilcoxon signed-rank tests showed no statistically significant differences across models due to high variability. Classifiers consistently demonstrated higher precision for non-racist content and higher recall for racist content. Qualitative analysis revealed ongoing difficulties with implicit, euphemized, and context-dependent racism, suggesting that culturally informed annotation is more critical than architectural complexity for improving racism detection.

Key takeaway

For research scientists developing hate speech detection systems, prioritize culturally grounded dataset annotation over solely pursuing advanced model architectures. Your efforts should focus on ensuring sociolinguistic validity in data collection, especially for nuanced forms of racism like euphemized or context-dependent discourse. This approach will likely yield more robust and accurate classifiers than simply deploying the latest large language models without specialized data.

Key insights

Culturally grounded annotation is more critical than model architecture for effective racism detection.

Principles

Method

The study involved manual annotation of a Brazilian Portuguese dataset by Black researchers, followed by binary classification using classical ML, Transformer-based LMs, and few-shot LLMs, with performance evaluated via Macro-F1 and Wilcoxon tests.

In practice

Topics

Best for: Research Scientist, AI Scientist, NLP Engineer, AI Ethicist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Paper Index on ACL Anthology.