Negation-Aware Data Augmentation for Portuguese Natural Language Inference

· Source: Paper Index on ACL Anthology · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics · Depth: Expert, quick

Summary

This research investigates the effect of targeted data augmentation, specifically focusing on negation cues, within Portuguese Natural Language Inference (NLI) datasets. The study utilized InferBR, ASSIN, and ASSIN2 datasets, synthetically generating new instances by negating hypotheses to enhance training and test set diversity. A BERT-based model was fine-tuned and evaluated on these combined and augmented datasets. The findings indicate that the model's performance was significantly affected by existing biases in negation usage within the original data. The introduction of increased data diversity through negation-aware augmentation demonstrably improved the model's ability to process and understand negation.

Key takeaway

For research scientists developing NLI models for Portuguese, you should consider implementing negation-aware data augmentation. This approach can mitigate biases present in current datasets and significantly improve your model's capacity to handle complex logical reasoning involving negation, leading to more robust and accurate NLI systems.

Key insights

Targeted negation-aware data augmentation improves BERT-based NLI model performance on Portuguese datasets by reducing negation bias.

Principles

Method

Synthetically generate new NLI instances by negating hypotheses in existing datasets (InferBR, ASSIN, ASSIN2) to create more diverse training and test sets for BERT-based model fine-tuning.

In practice

Topics

Best for: Research Scientist, AI Scientist, NLP Engineer, Machine Learning Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Paper Index on ACL Anthology.