An Iterative Test-and-Repair Framework for Competitive Code Generation
Summary
FixAudit, an iterative test-and-repair framework for competitive code generation, addresses limitations of previous methods like CURE. It trains a shared model with two roles: a Fixer, which repairs code based on failing tests, and an Auditor, which generates new, bug-exposing tests by reading the candidate code. This framework utilizes a four-stage training pipeline, starting with execution-aligned supervised fine-tuning (SFT), followed by reinforcement learning (RL) stages for initial repair, targeted test generation, and closed-loop refinement. Evaluated on APPS, CodeContests, and xCodeEval, FixAudit, built on a Qwen2.5-Coder-7B-Instruct model, surpasses the average performance of the larger Qwen2.5-Coder-32B-Instruct baseline by 24.9% in average Pass@1 and 40.5% in average AvgPassRatio in a zero-shot setting. It also improves average Pass@1 by 35.1% to 36.8% compared to strong 7B baselines like Specine and CURE.
Key takeaway
For AI Scientists and Machine Learning Engineers developing code generation models, you should integrate iterative test-and-repair mechanisms. FixAudit demonstrates that a code-aware Auditor for targeted bug exposure, coupled with a Fixer for incremental repair, significantly outperforms larger zero-shot models and existing frameworks. Consider adopting a multi-stage RL pipeline, starting with execution-aligned SFT, to build robust debugging capabilities and achieve higher Pass@1 scores with fewer iterations.
Key insights
Iterative, code-aware test-and-repair cycles significantly enhance competitive code generation performance.
Principles
- Execution reasoning is foundational for debugging agents.
- Targeted test generation requires candidate code analysis.
- Program repair should be incremental, preserving correct logic.
Method
FixAudit employs a four-stage RL training: SFT for execution reasoning, then iterative Fixer (repair with failing tests) and Auditor (generate code-aware bug-revealing tests) cycles, refined by DAPO.
In practice
- Implement a dedicated test generator that reads candidate code.
- Design rewards to prevent regressions during code repair.
- Use SFT to build execution reasoning before RL.
Topics
- Competitive Programming
- Code Generation
- Large Language Models
- Reinforcement Learning
- Program Repair
- Test Generation
Code references
Best for: Research Scientist, AI Scientist, Machine Learning Engineer, AI Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by cs.SE updates on arXiv.org.