Automatic Evaluation of Enem Essays: A Comparative Analysis between Feature Engineering and Transformers

2026-04-12 · Source: Paper Index on ACL Anthology · Field: Education & Learning — Educational Technology (EdTech), Artificial Intelligence & Machine Learning · Depth: Expert, quick

Summary

This work presents a comparative analysis of Automatic Essay Scoring (AES) methodologies for the Brazilian National High School Exam (Enem), focusing on competency-level evaluation in Brazilian Portuguese. Researchers evaluated feature-based models utilizing TF-IDF and linguistic metrics from NILC-Metrix, alongside transformer-based models. Experiments on the Enem-AES corpus considered both classification and regression formulations, with regression generally proving more suitable due to the ordinal nature of scores. Transformer models excelled in competencies related to language use and textual cohesion, while feature-based methods performed comparably for thematic relevance. Although achieving high accuracy under Enem's tolerance, all approaches struggled with extreme score prediction, primarily due to corpus imbalance, suggesting that hybrid systems could offer a promising solution.

Key takeaway

For NLP Engineers developing AES systems for large-scale educational assessments, prioritize regression formulations over multiclass classification to better capture the ordinal nature of essay scores. Your models will likely benefit from a hybrid approach combining transformer-based and feature-based representations, especially when addressing diverse scoring competencies. Be prepared to implement strategies to mitigate corpus imbalance, as it significantly impacts the prediction accuracy of extreme scores.

Key insights

Regression formulations are generally more suitable for ordinal essay scoring than multiclass classification.

Principles

Regression accommodates ordinal scores better.
Corpus imbalance hinders extreme score prediction.

Method

Evaluated feature-based models (TF-IDF, NILC-Metrix) and transformer-based models on the Enem-AES corpus using classification and regression for competency-level essay scoring.

In practice

Use regression for ordinal scoring tasks.
Consider hybrid models for improved AES.
Address corpus imbalance for extreme score accuracy.

Topics

Automatic Essay Scoring
Enem Assessment
Transformer Models
Feature Engineering
Brazilian Portuguese

Best for: AI Scientist, NLP Engineer, Research Scientist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Paper Index on ACL Anthology.