Evaluating Automated Scoring Models on Official ENEM Essays
Summary
A new labeled dataset of 157 official ENEM (Exame Nacional do Ensino Médio) essays in Brazilian Portuguese has been introduced to evaluate Automated Essay Scoring (AES) systems. This dataset addresses a gap in the literature, as previous work primarily used mock-exam essays. The analysis reveals that the new official ENEM dataset shares characteristics with existing mock-exam datasets. For small datasets like this one, the study found that using Large Language Models (LLMs) pretrained on mock exams significantly enhances the performance of automatic scorers for official ENEM essays, achieving an average gain of 0.27 points in the Quadratic Weighted Kappa (QWK) metric compared to models trained exclusively on official data.
Key takeaway
For AI Engineers developing Automated Essay Scoring systems for standardized tests like ENEM, leveraging LLMs pretrained on readily available mock exam data can substantially improve scoring accuracy on official exam essays, even when official data is scarce. This approach allows for more robust models and faster feedback cycles for students, enhancing practice opportunities.
Key insights
LLMs pretrained on mock exams significantly improve automated scoring of official ENEM essays, even with small datasets.
Principles
- Mock exam data can generalize to official exams.
- Pretraining LLMs boosts performance on small datasets.
Method
The study created a new labeled dataset of 157 official ENEM essays and evaluated LLMs pretrained on mock exams against models trained solely on official data, measuring performance with Quadratic Weighted Kappa.
In practice
- Use mock exam data for LLM pretraining.
- Apply LLMs to small, official essay datasets.
Topics
- Automated Essay Scoring
- ENEM Exam
- Brazilian Portuguese
- Large Language Models
- Quadratic Weighted Kappa
Best for: AI Engineer, Research Scientist, AI Scientist, Machine Learning Engineer, NLP Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Paper Index on ACL Anthology.