PsyScore: A Psychometrically-Aware Framework for Trait-Adaptive Essay Scoring and ZPD-Scaffolded Feedback

2026-06-19 · Source: cs.CL updates on arXiv.org · Field: Education & Learning — Educational Technology (EdTech), Academic Research & Higher Education, Educational Psychology & Learning Sciences · Depth: Expert, extended

Summary

PsyScore is a psychometrically-aware framework for Automated Essay Scoring (AES) that integrates diagnostic assessment with instructional scaffolding. It addresses limitations of existing AES by unifying scoring and feedback through a shared latent ability representation. The framework comprises a Trait-Adaptive Neural IRT Scorer, a ZPD-Scaffolded Feedback Generator, and a Multi-Perspective Feedback Evaluation Strategy. Experiments on the ASAP++ dataset demonstrate PsyScore achieves a state-of-the-art average Quadratic Weighted Kappa (QWK) of 0.747, surpassing the strongest baseline (SaMRL-large, 0.722). It also provides pedagogically aligned feedback, yielding a 17.38% normalized gain for low-proficiency students, transforming AES from summative scoring to formative diagnosis.

Key takeaway

For NLP Engineers developing educational AI, PsyScore's integration of psychometric modeling with ZPD-aligned feedback offers a robust path to more effective systems. You should consider adopting a latent ability representation to unify diagnostic scoring and instructional scaffolding. This approach significantly enhances feedback actionability and adaptivity, particularly for low-proficiency learners, moving beyond simple predictive accuracy to foster genuine learning gains.

Key insights

PsyScore unifies essay scoring and adaptive feedback using a shared psychometric latent ability representation.

Principles

Psychometric grounding improves AES validity and interpretability.
Feedback effectiveness increases with Zone of Proximal Development (ZPD) alignment.

Method

PsyScore estimates latent ability via a Neural GPCM Scorer, then generates ZPD-aligned feedback using a multi-agent system, evaluated by revision simulation and expert assessment.

In practice

Condition feedback generation on student latent ability for adaptivity.
Use multi-agent LLMs for diverse, debiased feedback synthesis.
Evaluate feedback via simulated student revisions and expert rubrics.

Topics

Automated Essay Scoring
Item Response Theory
Large Language Models
Zone of Proximal Development
Educational AI
Psychometrics
ASAP++ dataset

Best for: AI Scientist, Research Scientist, NLP Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CL updates on arXiv.org.