Product-of-Experts Training Reduces Dataset Artifacts in Natural Language Inference
Summary
Neural Natural Language Inference (NLI) models frequently overfit dataset artifacts, leading to spurious correlations rather than genuine reasoning. For instance, a hypothesis-only model achieved 57.7% accuracy on SNLI, indicating significant reliance on these artifacts, which account for 38.6% of baseline errors. To mitigate this, Product-of-Experts (PoE) training is proposed, a method that downweights examples where biased models exhibit high confidence. PoE training nearly maintains model accuracy, achieving 89.10% compared to a baseline of 89.30%, while reducing bias reliance by 4.71% (from 49.85% to 45% bias agreement). An ablation study identified a lambda value of 1.5 as optimal for balancing debiasing and accuracy. Despite these improvements, behavioral tests revealed persistent issues with negation and numerical reasoning.
Key takeaway
For AI Engineers developing NLI models, integrating Product-of-Experts (PoE) training can significantly reduce reliance on dataset artifacts without substantial accuracy loss. You should consider applying PoE, especially if your models exhibit high spurious correlations, and experiment with the lambda parameter to find the optimal balance between debiasing and performance for your specific datasets.
Key insights
Product-of-Experts (PoE) training reduces NLI model reliance on dataset artifacts while preserving accuracy.
Principles
- NLI models overfit dataset artifacts.
- Downweighting overconfident biased examples reduces bias.
Method
PoE training downweights examples where biased models are overconfident, balancing debiasing and accuracy through a lambda parameter (e.g., 1.5).
In practice
- Implement PoE training to reduce NLI model bias.
- Tune lambda to balance debiasing and accuracy.
Topics
- Product-of-Experts Training
- Natural Language Inference
- Dataset Artifacts
- Model Debiasing
- Spurious Correlations
Best for: AI Engineer, Research Scientist, AI Scientist, Machine Learning Engineer, NLP Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Takara TLDR - Daily AI Papers.