Exploring Data Augmentation and Resampling Strategies for Transformer-Based Models to Address Class Imbalance in AI Scoring of Scientific Explanations in NGSS Classroom

2026-04-24 · Source: cs.AI updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics · Depth: Expert, extended

Summary

A study investigated data augmentation and resampling strategies to improve transformer-based text classification for automated scoring of student scientific explanations, particularly addressing class imbalance in rubric categories. Using a dataset of 1,466 high school responses scored on 11 binary-coded analytic categories, the researchers fine-tuned SciBERT as a baseline. They tested three augmentation strategies: GPT-4 generated synthetic responses, EASE (word-level extraction and filtering), and ALP (phrase-level extraction using lexicalized probabilistic context-free grammar), alongside SMOTE oversampling. While fine-tuning SciBERT improved recall, augmentation significantly enhanced performance. GPT-4 data boosted both precision and recall, and ALP achieved perfect precision, recall, and F1 scores across most severely imbalanced categories (5, 6, 7, and 9). EASE augmentation substantially increased alignment with human scoring for both scientific ideas (Categories 1–6) and inaccurate ideas (Categories 7–11) across all rubric categories, demonstrating that targeted augmentation can effectively address severe imbalance while preserving conceptual coverage.

Key takeaway

For AI Engineers developing automated scoring systems for educational assessments, particularly those with imbalanced rubric categories, you should prioritize language-grounded data augmentation strategies like EASE or ALP over traditional oversampling methods such as SMOTE. These techniques demonstrably improve precision, recall, and F1 scores for underrepresented but instructionally critical categories, ensuring more accurate and reliable student placement within learning progressions. Consider integrating these augmentation pipelines to enhance model robustness and scalability without extensive manual data collection.

Key insights

Targeted data augmentation significantly improves transformer-based model performance in automated scoring of imbalanced educational text data.

Principles

Class imbalance severely degrades model performance on minority classes.
Language-grounded augmentation outperforms feature-space resampling.
Augmentation preserves conceptual coverage in educational assessments.

Method

Fine-tune SciBERT, then apply data-space augmentation (GPT-4 for document-level, EASE for word-level, ALP for phrase-level) to minority classes to balance rubric categories.

In practice

Use GPT-4 for generating diverse synthetic student responses.
Employ EASE for robust word-level augmentation in educational texts.
Apply ALP for phrase-level augmentation to increase syntactic diversity.

Topics

Data Augmentation
Transformer Models
AI Scoring
Class Imbalance
Scientific Explanations

Code references

Prud11djagba/-Optimizing-AI-Scoring-of-Scientific-Explanations-Exploring-Augmentation-Strategies-

Best for: AI Engineer, NLP Engineer, AI Scientist, Machine Learning Engineer, Research Scientist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.