Automatic Essay Scoring Systems Are Both Overstable And Oversensitive: Explaining Why And Proposing Defenses

2026-04-19 · Source: Paper Index on ACL Anthology · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Natural Language Processing · Depth: Expert, quick

Summary

Deep-learning based Automatic Essay Scoring (AES) systems, despite their use in high-stakes educational applications, exhibit surprising adversarial brittleness. Research utilizing interpretability techniques reveals that these "end-to-end" models, including those leveraging BERT embeddings, often behave like bag-of-words models. This leads to "overstability," where essay scores are determined by a few words without requiring full context, and "oversensitivity," where minor input changes cause significant score shifts. The oversensitivity stems from models learning dataset biases, associating essays with specific scores based on the co-occurrence of a few words, leading to score changes in approximately 95% of samples with minimal additions. To address these issues, detection-based protection models are proposed, which successfully identify unusual attribution patterns and flag adversarial samples with high accuracy.

Key takeaway

For AI Product Managers overseeing AES system development, understanding that current deep-learning models can be both overstable and oversensitive is critical. Your teams should prioritize robust interpretability analyses during model training to identify and mitigate reliance on superficial word patterns and dataset biases. Implementing detection-based protection models can significantly enhance the reliability and fairness of your AES systems, preventing unintended score manipulations and ensuring more accurate assessments.

Key insights

Deep-learning AES models are both overstable and oversensitive due to bag-of-words behavior and dataset biases.

Principles

AES models prioritize specific words over contextual understanding.
Dataset biases can lead to model oversensitivity.

Method

The study uses interpretability techniques to analyze feature importance in AES, investigating oversensitivity and overstability, and proposes detection-based protection models.

In practice

Identify specific words driving AES scores.
Evaluate AES models for dataset biases.
Implement detection models for adversarial samples.

Topics

Automatic Essay Scoring
Deep Learning Models
Model Interpretability
Adversarial Brittleness
Dataset Bias

Best for: Research Scientist, AI Product Manager, AI Scientist, Machine Learning Engineer, NLP Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Paper Index on ACL Anthology.