Negation-Aware Data Augmentation for Portuguese Natural Language Inference
Summary
This research investigates the effect of targeted data augmentation, specifically focusing on negation cues, within Portuguese Natural Language Inference (NLI) datasets. The study utilized InferBR, ASSIN, and ASSIN2 datasets, synthetically generating new instances by negating hypotheses to enhance training and test set diversity. A BERT-based model was fine-tuned and evaluated on these combined and augmented datasets. The findings indicate that the model's performance was significantly affected by existing biases in negation usage within the original data. The introduction of increased data diversity through negation-aware augmentation demonstrably improved the model's ability to process and understand negation.
Key takeaway
For research scientists developing NLI models for Portuguese, you should consider implementing negation-aware data augmentation. This approach can mitigate biases present in current datasets and significantly improve your model's capacity to handle complex logical reasoning involving negation, leading to more robust and accurate NLI systems.
Key insights
Targeted negation-aware data augmentation improves BERT-based NLI model performance on Portuguese datasets by reducing negation bias.
Principles
- Negation is critical for logical reasoning.
- NLI datasets often underrepresent negation.
- Data diversity improves model robustness.
Method
Synthetically generate new NLI instances by negating hypotheses in existing datasets (InferBR, ASSIN, ASSIN2) to create more diverse training and test sets for BERT-based model fine-tuning.
In practice
- Augment NLI datasets with negated hypotheses.
- Test models for negation bias.
- Apply to other low-resource languages.
Topics
- Negation
- Data Augmentation
- Natural Language Inference
- Portuguese NLI
- BERT Models
Best for: Research Scientist, AI Scientist, NLP Engineer, Machine Learning Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Paper Index on ACL Anthology.