Exploring Data Augmentation and Resampling Strategies for Transformer-Based Models to Address Class Imbalance in AI Scoring of Scientific Explanations in NGSS Classroom
Summary
A study investigated data augmentation and resampling strategies to improve transformer-based text classification for automated scoring of student scientific explanations, particularly addressing class imbalance in rubric categories. Using a dataset of 1,466 high school responses scored on 11 binary-coded analytic categories, the researchers fine-tuned SciBERT as a baseline. They tested three augmentation strategies: GPT-4 generated synthetic responses, EASE (word-level extraction and filtering), and ALP (phrase-level extraction using lexicalized probabilistic context-free grammar), alongside SMOTE oversampling. While fine-tuning SciBERT improved recall, augmentation significantly enhanced performance. GPT-4 data boosted both precision and recall, and ALP achieved perfect precision, recall, and F1 scores across most severely imbalanced categories (5, 6, 7, and 9). EASE augmentation substantially increased alignment with human scoring for both scientific ideas (Categories 1–6) and inaccurate ideas (Categories 7–11) across all rubric categories, demonstrating that targeted augmentation can effectively address severe imbalance while preserving conceptual coverage.
Key takeaway
For AI Engineers developing automated scoring systems for educational assessments, particularly those with imbalanced rubric categories, you should prioritize language-grounded data augmentation strategies like EASE or ALP over traditional oversampling methods such as SMOTE. These techniques demonstrably improve precision, recall, and F1 scores for underrepresented but instructionally critical categories, ensuring more accurate and reliable student placement within learning progressions. Consider integrating these augmentation pipelines to enhance model robustness and scalability without extensive manual data collection.
Key insights
Targeted data augmentation significantly improves transformer-based model performance in automated scoring of imbalanced educational text data.
Principles
- Class imbalance severely degrades model performance on minority classes.
- Language-grounded augmentation outperforms feature-space resampling.
- Augmentation preserves conceptual coverage in educational assessments.
Method
Fine-tune SciBERT, then apply data-space augmentation (GPT-4 for document-level, EASE for word-level, ALP for phrase-level) to minority classes to balance rubric categories.
In practice
- Use GPT-4 for generating diverse synthetic student responses.
- Employ EASE for robust word-level augmentation in educational texts.
- Apply ALP for phrase-level augmentation to increase syntactic diversity.
Topics
- Data Augmentation
- Transformer Models
- AI Scoring
- Class Imbalance
- Scientific Explanations
Code references
Best for: AI Engineer, NLP Engineer, AI Scientist, Machine Learning Engineer, Research Scientist
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.