Can AI tools assess coding assignments?
Summary
An experiment conducted by a Harvard Medical School PhD student and a higher-education researcher explored the use of generative AI, specifically OpenAI's ChatGPT 5.4, for assessing undergraduate coding assignments. Initially, ChatGPT struggled, primarily comparing student code to a reference solution and focusing on minor inefficiencies rather than conceptual understanding. The researchers improved its utility by providing crucial context, including common student mistakes and which minor issues to disregard. This enhanced approach allowed ChatGPT to identify flaws in student logic and propose "edge cases" for testing, which significantly reduced manual inspection time for complex algorithms like genome sequence alignment. However, the AI still exhibited limitations, such as misidentifying valid alternative solutions as errors and generating confident but incorrect explanations, indicating that fully automated grading remains impractical.
Key takeaway
For educators assessing coding assignments, integrate generative AI as a supplementary tool rather than a fully automated grader. Provide the AI with comprehensive context, including common student errors and acceptable variations, to improve its diagnostic capabilities. Leverage its strength in generating "edge cases" to create more robust rubrics and identify subtle logical flaws, ultimately saving significant manual inspection time while preserving human interpretative judgment.
Key insights
Generative AI can assist in coding assignment assessment when integrated thoughtfully with human expertise and context.
Principles
- Assessment is deeply interpretative.
- AI excels at identifying edge cases.
- Context improves AI assessment accuracy.
Method
Provide AI with problem sets, reference solutions, and explicit guidance on common errors and non-penalizable minor issues. Use AI to generate additional test cases, especially for edge scenarios, to enhance rubric thoroughness.
In practice
- Use AI as a teaching assistant, not a final grader.
- Supply AI with common student mistakes.
- Incorporate AI-generated edge cases into rubrics.
Topics
- Coding Assignment Assessment
- Generative AI
- ChatGPT
- Edge Case Generation
- Computational Biology
Best for: Research Scientist, Prompt Engineer, Domain Expert
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Machine learning : nature.com subject feeds.