Neuro-symbolic Approaches for Rubric-Based Automatic Essay Evaluation of ENEM Essays
Summary
Researchers developed two neuro-symbolic approaches for trait-specific automated essay scoring of essays written for the standardized Brazilian National Entrance Exam (ENEM). Current state-of-the-art systems treat this as a purely statistical predictive task, overlooking the rubrics and guidelines provided to human graders. The first approach utilizes GPT-4o to generate evaluative explanations based on subcriteria from the official ENEM Grader's handbook, which are then used by a statistical model to predict the essay score. The second approach formalizes the grading rubrics as logical rules to derive scores from subcriteria, mirroring human grader methodology. To support training and evaluation, a dataset of 63 essays was annotated with subcriteria by two expert human graders. Empirical results indicate both neuro-symbolic methods achieve performance comparable to purely statistical methods while offering more detailed and interpretable feedback.
Key takeaway
For research scientists developing automated essay scoring systems, consider integrating neuro-symbolic approaches to improve interpretability and provide more actionable feedback. Your systems can achieve comparable performance to purely statistical methods while offering richer insights, which is crucial for educational applications requiring transparency and detailed guidance for students. Explore using large language models for explanation generation or formalizing rubrics into logical rules to enhance your model's output.
Key insights
Neuro-symbolic methods can enhance automated essay scoring with interpretability and fine-grained feedback.
Principles
- Integrate human knowledge into AI systems.
- Rubrics can be formalized as logical rules.
Method
One method uses GPT-4o to generate evaluative explanations for statistical score prediction. The second formalizes grading rubrics into logical rules to derive scores from subcriteria, mimicking human evaluation.
In practice
- Use LLMs for explanation generation.
- Formalize rubrics into logical rules.
Topics
- Neuro-symbolic AI
- Automatic Essay Evaluation
- ENEM Essays
- Rubric-Based Scoring
- Large Language Models
Best for: Research Scientist, AI Scientist, Machine Learning Engineer, NLP Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Paper Index on ACL Anthology.