Unlocking LLM Code Correction with Iterative Feedback Loops

2026-06-17 · Source: cs.SE updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Software Development & Engineering · Depth: Expert, extended

Summary

This study systematically investigates Large Language Models' (LLMs) ability to self-correct code using iterative execution feedback. Researchers evaluated four state-of-the-art LLMs (DeepSeek-R1, DeepSeek-V3, GPT-o4-mini, GPT-4.1-mini) across two programming languages (Python, Java) on real-world LeetCode problems. The iterative refinement framework provided compiler errors and testcase feedback over up to 10 iterations. Results show reasoning models like DeepSeek-R1 and GPT-o4-mini consistently improve, significantly outperforming non-reasoning models in leveraging feedback. Syntactic and runtime errors were far more tractable (fix rates >80%) than logical or algorithmic failures (fix rates <35%), revealing LLMs' current limitations in deep algorithmic reasoning. The study introduces new metrics like Iterative Success Rate (ISR@k) and Median Iterations to Solve (MIS) for a more realistic evaluation.

Key takeaway

For Machine Learning Engineers developing LLM-driven code generation systems, you should integrate iterative feedback loops to significantly improve code correctness beyond single-attempt performance. Focus on providing clear execution feedback for syntactic and runtime errors, as these are most amenable to LLM self-correction. Be aware that deep algorithmic or logical errors remain challenging, requiring alternative strategies or human intervention. Consider using metrics like ISR@k and MIS for a more comprehensive evaluation of your models' real-world utility.

Key insights

Iterative feedback loops significantly enhance LLM code correction, especially for reasoning models and specific error types.

Principles

Reasoning capacity improves feedback utilization.
Explicit prompt guidance enhances code efficiency.
Error type dictates correction tractability.

Method

An automated feedback loop executes LLM-generated code, constructs prompts with failure messages, and provides this execution feedback to the LLM for iterative refinement over up to 10 turns.

In practice

Implement multi-turn feedback for LLM code generation.
Prioritize fixing syntactic and runtime errors first.
Use ISR@k and MIS for robust LLM evaluation.

Topics

Large Language Models
Code Generation
Iterative Refinement
Feedback Loops
Model Evaluation
Algorithmic Optimization

Code references

lezhangisu/LLM-Code-Correction

Best for: Research Scientist, AI Engineer, AI Scientist, Machine Learning Engineer, Software Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.SE updates on arXiv.org.