An Iterative Test-and-Repair Framework for Competitive Code Generation

· Source: cs.SE updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Software Development & Engineering · Depth: Expert, extended

Summary

FixAudit, an iterative test-and-repair framework for competitive code generation, addresses limitations of previous methods like CURE. It trains a shared model with two roles: a Fixer, which repairs code based on failing tests, and an Auditor, which generates new, bug-exposing tests by reading the candidate code. This framework utilizes a four-stage training pipeline, starting with execution-aligned supervised fine-tuning (SFT), followed by reinforcement learning (RL) stages for initial repair, targeted test generation, and closed-loop refinement. Evaluated on APPS, CodeContests, and xCodeEval, FixAudit, built on a Qwen2.5-Coder-7B-Instruct model, surpasses the average performance of the larger Qwen2.5-Coder-32B-Instruct baseline by 24.9% in average Pass@1 and 40.5% in average AvgPassRatio in a zero-shot setting. It also improves average Pass@1 by 35.1% to 36.8% compared to strong 7B baselines like Specine and CURE.

Key takeaway

For AI Scientists and Machine Learning Engineers developing code generation models, you should integrate iterative test-and-repair mechanisms. FixAudit demonstrates that a code-aware Auditor for targeted bug exposure, coupled with a Fixer for incremental repair, significantly outperforms larger zero-shot models and existing frameworks. Consider adopting a multi-stage RL pipeline, starting with execution-aligned SFT, to build robust debugging capabilities and achieve higher Pass@1 scores with fewer iterations.

Key insights

Iterative, code-aware test-and-repair cycles significantly enhance competitive code generation performance.

Principles

Method

FixAudit employs a four-stage RL training: SFT for execution reasoning, then iterative Fixer (repair with failing tests) and Auditor (generate code-aware bug-revealing tests) cycles, refined by DAPO.

In practice

Topics

Code references

Best for: Research Scientist, AI Scientist, Machine Learning Engineer, AI Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.SE updates on arXiv.org.