Automatic Evaluation of ENEM Essays: An Empirical Study on Linguistic and Contextual Representations
Summary
Automatic Essay Scoring (AES) for Brazilian Portuguese, specifically for the Enem exam, remains a complex task due to its multi-competency assessment and ordinal scoring. This study investigates hybrid modeling strategies for competency-level AES, integrating explicit linguistic features with contextual representations. Researchers utilized the Enem-AES corpus and modeled each competency's evaluation as an ordinal prediction problem using the CORAL framework. The empirical comparison included traditional lexical representations, linguistic metrics from NILC-Metrix, task-oriented manual features, contextual embeddings, and various combinations. Hybrid models demonstrated the highest average agreement with human scores, though performance varied by competency and representation type. The analysis also explored model behavior in rater disagreement scenarios, underscoring annotation variability's impact on performance.
Key takeaway
For research scientists developing AES systems for high-stakes exams like Enem, you should prioritize hybrid modeling strategies that integrate both explicit linguistic features and contextual embeddings. This approach has shown superior agreement with human scores, but be prepared to fine-tune models for individual competencies and account for the impact of human rater disagreement on your system's performance.
Key insights
Hybrid models combining linguistic features and contextual embeddings improve Automatic Essay Scoring for Brazilian Portuguese Enem exams.
Principles
- AES performance varies across competencies.
- Annotation variability impacts model performance.
Method
The study modeled AES competency evaluation as an ordinal prediction problem using the CORAL framework, comparing lexical, linguistic, manual, and contextual representations on the Enem-AES corpus.
In practice
- Combine linguistic features with contextual embeddings.
- Use CORAL for ordinal prediction problems.
Topics
- Automatic Essay Scoring
- Enem Exam
- Contextual Embeddings
- Linguistic Features
- Ordinal Prediction
Best for: Research Scientist, AI Scientist, NLP Engineer, Machine Learning Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Paper Index on ACL Anthology.