A Multitask Transformer for Offensive Language Detection and Target Identification in HateBR

· Source: Paper Index on ACL Anthology · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Natural Language Processing · Depth: Expert, medium

Summary

A Multitask Learning (MTL) approach, employing a shared BERTimbau encoder, has been developed for offensive language detection and target identification within the HateBR dataset. This MTL architecture simultaneously predicts binary offensiveness, ordinal severity, and specific hate speech targets, moving beyond traditional binary classification. Experiments show the MTL model improves the Matthews Correlation Coefficient for offensive detection from 0.80 to 0.82 compared to Single-Task baselines. Furthermore, joint training ensures hierarchical consistency, achieving a 0% target-inconsistency rate where no comment is labeled "Non-offensive" yet assigned a hate target. However, the model exhibits negative transfer in the fine-grained multilabel target task, with Micro-F1 dropping from 0.59 to 0.42, indicating a trade-off between logical consistency and target attribution, especially under extreme class imbalance.

Key takeaway

For research scientists developing hate speech detection systems, you should consider implementing Multitask Learning (MTL) with shared encoders to enhance overall offensive language detection performance and ensure hierarchical consistency. Be aware that while MTL improves primary detection, you may observe reduced performance on fine-grained, multilabel target identification tasks, especially with imbalanced datasets. Evaluate the specific needs of your application to balance consistency with detailed target attribution.

Key insights

Multitask learning improves offensive language detection consistency but may reduce fine-grained target attribution.

Principles

Method

A Multitask Learning (MTL) approach uses a shared BERTimbau encoder to predict binary offensiveness, ordinal severity, and hate speech targets simultaneously on the HateBR dataset.

In practice

Topics

Best for: Research Scientist, AI Scientist, NLP Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Paper Index on ACL Anthology.