Automatic Essay Scoring Systems Are Both Overstable And Oversensitive: Explaining Why And Proposing Defenses
Summary
Deep-learning based Automatic Essay Scoring (AES) systems, despite their use in high-stakes educational applications, exhibit surprising adversarial brittleness. Research utilizing interpretability techniques reveals that these "end-to-end" models, including those leveraging BERT embeddings, often behave like bag-of-words models. This leads to "overstability," where essay scores are determined by a few words without requiring full context, and "oversensitivity," where minor input changes cause significant score shifts. The oversensitivity stems from models learning dataset biases, associating essays with specific scores based on the co-occurrence of a few words, leading to score changes in approximately 95% of samples with minimal additions. To address these issues, detection-based protection models are proposed, which successfully identify unusual attribution patterns and flag adversarial samples with high accuracy.
Key takeaway
For AI Product Managers overseeing AES system development, understanding that current deep-learning models can be both overstable and oversensitive is critical. Your teams should prioritize robust interpretability analyses during model training to identify and mitigate reliance on superficial word patterns and dataset biases. Implementing detection-based protection models can significantly enhance the reliability and fairness of your AES systems, preventing unintended score manipulations and ensuring more accurate assessments.
Key insights
Deep-learning AES models are both overstable and oversensitive due to bag-of-words behavior and dataset biases.
Principles
- AES models prioritize specific words over contextual understanding.
- Dataset biases can lead to model oversensitivity.
Method
The study uses interpretability techniques to analyze feature importance in AES, investigating oversensitivity and overstability, and proposes detection-based protection models.
In practice
- Identify specific words driving AES scores.
- Evaluate AES models for dataset biases.
- Implement detection models for adversarial samples.
Topics
- Automatic Essay Scoring
- Deep Learning Models
- Model Interpretability
- Adversarial Brittleness
- Dataset Bias
Best for: Research Scientist, AI Product Manager, AI Scientist, Machine Learning Engineer, NLP Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Paper Index on ACL Anthology.