A Multitask Transformer for Offensive Language Detection and Target Identification in HateBR
Summary
A Multitask Learning (MTL) approach, employing a shared BERTimbau encoder, has been developed for offensive language detection and target identification within the HateBR dataset. This MTL architecture simultaneously predicts binary offensiveness, ordinal severity, and specific hate speech targets, moving beyond traditional binary classification. Experiments show the MTL model improves the Matthews Correlation Coefficient for offensive detection from 0.80 to 0.82 compared to Single-Task baselines. Furthermore, joint training ensures hierarchical consistency, achieving a 0% target-inconsistency rate where no comment is labeled "Non-offensive" yet assigned a hate target. However, the model exhibits negative transfer in the fine-grained multilabel target task, with Micro-F1 dropping from 0.59 to 0.42, indicating a trade-off between logical consistency and target attribution, especially under extreme class imbalance.
Key takeaway
For research scientists developing hate speech detection systems, you should consider implementing Multitask Learning (MTL) with shared encoders to enhance overall offensive language detection performance and ensure hierarchical consistency. Be aware that while MTL improves primary detection, you may observe reduced performance on fine-grained, multilabel target identification tasks, especially with imbalanced datasets. Evaluate the specific needs of your application to balance consistency with detailed target attribution.
Key insights
Multitask learning improves offensive language detection consistency but may reduce fine-grained target attribution.
Principles
- Toxicity is hierarchical, not binary.
- Joint training can enforce logical consistency.
Method
A Multitask Learning (MTL) approach uses a shared BERTimbau encoder to predict binary offensiveness, ordinal severity, and hate speech targets simultaneously on the HateBR dataset.
In practice
- Use MTL for improved offensive detection.
- Consider trade-offs for fine-grained target tasks.
Topics
- Multitask Learning
- Offensive Language Detection
- Target Identification
- BERTimbau
- HateBR Dataset
Best for: Research Scientist, AI Scientist, NLP Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Paper Index on ACL Anthology.