Neuro-symbolic Approaches for Rubric-Based Automatic Essay Evaluation of ENEM Essays

· Source: Paper Index on ACL Anthology · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics · Depth: Advanced, quick

Summary

Researchers developed two neuro-symbolic approaches for trait-specific automated essay scoring of essays written for the standardized Brazilian National Entrance Exam (ENEM). Current state-of-the-art systems treat this as a purely statistical predictive task, overlooking the rubrics and guidelines provided to human graders. The first approach utilizes GPT-4o to generate evaluative explanations based on subcriteria from the official ENEM Grader's handbook, which are then used by a statistical model to predict the essay score. The second approach formalizes the grading rubrics as logical rules to derive scores from subcriteria, mirroring human grader methodology. To support training and evaluation, a dataset of 63 essays was annotated with subcriteria by two expert human graders. Empirical results indicate both neuro-symbolic methods achieve performance comparable to purely statistical methods while offering more detailed and interpretable feedback.

Key takeaway

For research scientists developing automated essay scoring systems, consider integrating neuro-symbolic approaches to improve interpretability and provide more actionable feedback. Your systems can achieve comparable performance to purely statistical methods while offering richer insights, which is crucial for educational applications requiring transparency and detailed guidance for students. Explore using large language models for explanation generation or formalizing rubrics into logical rules to enhance your model's output.

Key insights

Neuro-symbolic methods can enhance automated essay scoring with interpretability and fine-grained feedback.

Principles

Method

One method uses GPT-4o to generate evaluative explanations for statistical score prediction. The second formalizes grading rubrics into logical rules to derive scores from subcriteria, mimicking human evaluation.

In practice

Topics

Best for: Research Scientist, AI Scientist, Machine Learning Engineer, NLP Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Paper Index on ACL Anthology.