Beyond Grading Accuracy: Exploring Alignment of TAs and LLMs
Summary
A study investigates the efficacy of open-source Large Language Models (LLMs) for grading Unified Modeling Language (UML) class diagrams, focusing on transparency and cost-effectiveness for academic institutions. Researchers developed a grading pipeline comparing evaluations from teaching assistants (TAs) and six open-source LLMs on 92 student-generated UML class diagrams from a software design course. Unlike prior work, this approach assesses performance at the individual grading criterion level, providing granular insights into LLM-human alignment. The quantitative study revealed per-criterion accuracy reaching 88.56% and a Pearson correlation coefficient of up to 0.78, significantly outperforming previous efforts with open-source models. These results indicate that open-source LLMs can effectively support UML class diagram grading, offering a viable path toward mixed-initiative grading systems to help TAs manage growing student workloads.
Key takeaway
For university educators and software engineering instructors facing increasing student enrollments, integrating open-source LLMs into your grading workflow for UML class diagrams offers a practical solution. You can significantly reduce TA workload by utilizing LLMs for initial criterion-level assessments, freeing TAs to focus on nuanced feedback. Consider piloting a mixed-initiative grading system where LLMs provide a baseline, improving grading efficiency and consistency without compromising transparency or incurring high costs.
Key insights
Open-source LLMs can effectively grade UML class diagrams with high per-criterion accuracy, aiding TAs.
Principles
- Open-source LLMs offer transparent, cost-effective grading.
- Criterion-level evaluation reveals LLM-human alignment.
- LLMs can achieve TA-level grading performance.
Method
The proposed grading pipeline involves independent evaluation of student UML class diagrams by TAs and LLMs, followed by comparison of grades at the individual criterion level.
In practice
- Implement LLM-aided grading for UML diagrams.
- Use open-source LLMs for cost-sensitive academic settings.
- Focus evaluation on individual grading criteria.
Topics
- Large Language Models
- UML Class Diagrams
- Automated Grading
- Software Engineering Education
- Open-source LLMs
- Academic Workload Management
Best for: AI Scientist, Research Scientist, Machine Learning Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by cs.SE updates on arXiv.org.