Automatic Evaluation of Enem Essays: A Comparative Analysis between Feature Engineering and Transformers
Summary
This work presents a comparative analysis of Automatic Essay Scoring (AES) methodologies for the Brazilian National High School Exam (Enem), focusing on competency-level evaluation in Brazilian Portuguese. Researchers evaluated feature-based models utilizing TF-IDF and linguistic metrics from NILC-Metrix, alongside transformer-based models. Experiments on the Enem-AES corpus considered both classification and regression formulations, with regression generally proving more suitable due to the ordinal nature of scores. Transformer models excelled in competencies related to language use and textual cohesion, while feature-based methods performed comparably for thematic relevance. Although achieving high accuracy under Enem's tolerance, all approaches struggled with extreme score prediction, primarily due to corpus imbalance, suggesting that hybrid systems could offer a promising solution.
Key takeaway
For NLP Engineers developing AES systems for large-scale educational assessments, prioritize regression formulations over multiclass classification to better capture the ordinal nature of essay scores. Your models will likely benefit from a hybrid approach combining transformer-based and feature-based representations, especially when addressing diverse scoring competencies. Be prepared to implement strategies to mitigate corpus imbalance, as it significantly impacts the prediction accuracy of extreme scores.
Key insights
Regression formulations are generally more suitable for ordinal essay scoring than multiclass classification.
Principles
- Regression accommodates ordinal scores better.
- Corpus imbalance hinders extreme score prediction.
Method
Evaluated feature-based models (TF-IDF, NILC-Metrix) and transformer-based models on the Enem-AES corpus using classification and regression for competency-level essay scoring.
In practice
- Use regression for ordinal scoring tasks.
- Consider hybrid models for improved AES.
- Address corpus imbalance for extreme score accuracy.
Topics
- Automatic Essay Scoring
- Enem Assessment
- Transformer Models
- Feature Engineering
- Brazilian Portuguese
Best for: AI Scientist, NLP Engineer, Research Scientist
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Paper Index on ACL Anthology.