Evaluating Automated Scoring Models on Official ENEM Essays

2026-04-12 · Source: Paper Index on ACL Anthology · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics · Depth: Expert, quick

Summary

A new labeled dataset of 157 official ENEM (Exame Nacional do Ensino Médio) essays in Brazilian Portuguese has been introduced to evaluate Automated Essay Scoring (AES) systems. This dataset addresses a gap in the literature, as previous work primarily used mock-exam essays. The analysis reveals that the new official ENEM dataset shares characteristics with existing mock-exam datasets. For small datasets like this one, the study found that using Large Language Models (LLMs) pretrained on mock exams significantly enhances the performance of automatic scorers for official ENEM essays, achieving an average gain of 0.27 points in the Quadratic Weighted Kappa (QWK) metric compared to models trained exclusively on official data.

Key takeaway

For AI Engineers developing Automated Essay Scoring systems for standardized tests like ENEM, leveraging LLMs pretrained on readily available mock exam data can substantially improve scoring accuracy on official exam essays, even when official data is scarce. This approach allows for more robust models and faster feedback cycles for students, enhancing practice opportunities.

Key insights

LLMs pretrained on mock exams significantly improve automated scoring of official ENEM essays, even with small datasets.

Principles

Mock exam data can generalize to official exams.
Pretraining LLMs boosts performance on small datasets.

Method

The study created a new labeled dataset of 157 official ENEM essays and evaluated LLMs pretrained on mock exams against models trained solely on official data, measuring performance with Quadratic Weighted Kappa.

In practice

Use mock exam data for LLM pretraining.
Apply LLMs to small, official essay datasets.

Topics

Automated Essay Scoring
ENEM Exam
Brazilian Portuguese
Large Language Models
Quadratic Weighted Kappa

Best for: AI Engineer, Research Scientist, AI Scientist, Machine Learning Engineer, NLP Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Paper Index on ACL Anthology.