Unlocking LLM Code Correction with Iterative Feedback Loops

· Source: cs.SE updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Software Development & Engineering · Depth: Expert, extended

Summary

This study systematically investigates Large Language Models' (LLMs) ability to self-correct code using iterative execution feedback. Researchers evaluated four state-of-the-art LLMs (DeepSeek-R1, DeepSeek-V3, GPT-o4-mini, GPT-4.1-mini) across two programming languages (Python, Java) on real-world LeetCode problems. The iterative refinement framework provided compiler errors and testcase feedback over up to 10 iterations. Results show reasoning models like DeepSeek-R1 and GPT-o4-mini consistently improve, significantly outperforming non-reasoning models in leveraging feedback. Syntactic and runtime errors were far more tractable (fix rates >80%) than logical or algorithmic failures (fix rates <35%), revealing LLMs' current limitations in deep algorithmic reasoning. The study introduces new metrics like Iterative Success Rate (ISR@k) and Median Iterations to Solve (MIS) for a more realistic evaluation.

Key takeaway

For Machine Learning Engineers developing LLM-driven code generation systems, you should integrate iterative feedback loops to significantly improve code correctness beyond single-attempt performance. Focus on providing clear execution feedback for syntactic and runtime errors, as these are most amenable to LLM self-correction. Be aware that deep algorithmic or logical errors remain challenging, requiring alternative strategies or human intervention. Consider using metrics like ISR@k and MIS for a more comprehensive evaluation of your models' real-world utility.

Key insights

Iterative feedback loops significantly enhance LLM code correction, especially for reasoning models and specific error types.

Principles

Method

An automated feedback loop executes LLM-generated code, constructs prompts with failure messages, and provides this execution feedback to the LLM for iterative refinement over up to 10 turns.

In practice

Topics

Code references

Best for: Research Scientist, AI Engineer, AI Scientist, Machine Learning Engineer, Software Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.SE updates on arXiv.org.