Can AI tools assess coding assignments?

· Source: Machine learning : nature.com subject feeds · Field: Education & Learning — Educational Technology (EdTech), Academic Research & Higher Education · Depth: Intermediate, short

Summary

An experiment conducted by a Harvard Medical School PhD student and a higher-education researcher explored the use of generative AI, specifically OpenAI's ChatGPT 5.4, for assessing undergraduate coding assignments. Initially, ChatGPT struggled, primarily comparing student code to a reference solution and focusing on minor inefficiencies rather than conceptual understanding. The researchers improved its utility by providing crucial context, including common student mistakes and which minor issues to disregard. This enhanced approach allowed ChatGPT to identify flaws in student logic and propose "edge cases" for testing, which significantly reduced manual inspection time for complex algorithms like genome sequence alignment. However, the AI still exhibited limitations, such as misidentifying valid alternative solutions as errors and generating confident but incorrect explanations, indicating that fully automated grading remains impractical.

Key takeaway

For educators assessing coding assignments, integrate generative AI as a supplementary tool rather than a fully automated grader. Provide the AI with comprehensive context, including common student errors and acceptable variations, to improve its diagnostic capabilities. Leverage its strength in generating "edge cases" to create more robust rubrics and identify subtle logical flaws, ultimately saving significant manual inspection time while preserving human interpretative judgment.

Key insights

Generative AI can assist in coding assignment assessment when integrated thoughtfully with human expertise and context.

Principles

Method

Provide AI with problem sets, reference solutions, and explicit guidance on common errors and non-penalizable minor issues. Use AI to generate additional test cases, especially for edge scenarios, to enhance rubric thoroughness.

In practice

Topics

Best for: Research Scientist, Prompt Engineer, Domain Expert

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Machine learning : nature.com subject feeds.