Beyond Grading Accuracy: Exploring Alignment of TAs and LLMs

2026-03-17 · Source: cs.SE updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Software Development & Engineering · Depth: Expert, short

Summary

A study investigates the efficacy of open-source Large Language Models (LLMs) for grading Unified Modeling Language (UML) class diagrams, focusing on transparency and cost-effectiveness for academic institutions. Researchers developed a grading pipeline comparing evaluations from teaching assistants (TAs) and six open-source LLMs on 92 student-generated UML class diagrams from a software design course. Unlike prior work, this approach assesses performance at the individual grading criterion level, providing granular insights into LLM-human alignment. The quantitative study revealed per-criterion accuracy reaching 88.56% and a Pearson correlation coefficient of up to 0.78, significantly outperforming previous efforts with open-source models. These results indicate that open-source LLMs can effectively support UML class diagram grading, offering a viable path toward mixed-initiative grading systems to help TAs manage growing student workloads.

Key takeaway

For university educators and software engineering instructors facing increasing student enrollments, integrating open-source LLMs into your grading workflow for UML class diagrams offers a practical solution. You can significantly reduce TA workload by utilizing LLMs for initial criterion-level assessments, freeing TAs to focus on nuanced feedback. Consider piloting a mixed-initiative grading system where LLMs provide a baseline, improving grading efficiency and consistency without compromising transparency or incurring high costs.

Key insights

Open-source LLMs can effectively grade UML class diagrams with high per-criterion accuracy, aiding TAs.

Principles

Open-source LLMs offer transparent, cost-effective grading.
Criterion-level evaluation reveals LLM-human alignment.
LLMs can achieve TA-level grading performance.

Method

The proposed grading pipeline involves independent evaluation of student UML class diagrams by TAs and LLMs, followed by comparison of grades at the individual criterion level.

In practice

Implement LLM-aided grading for UML diagrams.
Use open-source LLMs for cost-sensitive academic settings.
Focus evaluation on individual grading criteria.

Topics

Large Language Models
UML Class Diagrams
Automated Grading
Software Engineering Education
Open-source LLMs
Academic Workload Management

Best for: AI Scientist, Research Scientist, Machine Learning Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.SE updates on arXiv.org.