PsyScore: A Psychometrically-Aware Framework for Trait-Adaptive Essay Scoring and ZPD-Scaffolded Feedback
Summary
PsyScore is a psychometrically-aware framework for Automated Essay Scoring (AES) that integrates diagnostic assessment with instructional scaffolding. It addresses limitations of existing AES by unifying scoring and feedback through a shared latent ability representation. The framework comprises a Trait-Adaptive Neural IRT Scorer, a ZPD-Scaffolded Feedback Generator, and a Multi-Perspective Feedback Evaluation Strategy. Experiments on the ASAP++ dataset demonstrate PsyScore achieves a state-of-the-art average Quadratic Weighted Kappa (QWK) of 0.747, surpassing the strongest baseline (SaMRL-large, 0.722). It also provides pedagogically aligned feedback, yielding a 17.38% normalized gain for low-proficiency students, transforming AES from summative scoring to formative diagnosis.
Key takeaway
For NLP Engineers developing educational AI, PsyScore's integration of psychometric modeling with ZPD-aligned feedback offers a robust path to more effective systems. You should consider adopting a latent ability representation to unify diagnostic scoring and instructional scaffolding. This approach significantly enhances feedback actionability and adaptivity, particularly for low-proficiency learners, moving beyond simple predictive accuracy to foster genuine learning gains.
Key insights
PsyScore unifies essay scoring and adaptive feedback using a shared psychometric latent ability representation.
Principles
- Psychometric grounding improves AES validity and interpretability.
- Feedback effectiveness increases with Zone of Proximal Development (ZPD) alignment.
Method
PsyScore estimates latent ability via a Neural GPCM Scorer, then generates ZPD-aligned feedback using a multi-agent system, evaluated by revision simulation and expert assessment.
In practice
- Condition feedback generation on student latent ability for adaptivity.
- Use multi-agent LLMs for diverse, debiased feedback synthesis.
- Evaluate feedback via simulated student revisions and expert rubrics.
Topics
- Automated Essay Scoring
- Item Response Theory
- Large Language Models
- Zone of Proximal Development
- Educational AI
- Psychometrics
- ASAP++ dataset
Best for: AI Scientist, Research Scientist, NLP Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CL updates on arXiv.org.