ZeroCoder: Can LLMs Improve Code Generation Without Ground-Truth Supervision?

· Source: cs.SE updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Software Development & Engineering · Depth: Expert, extended

Summary

ZeroCoder is a novel, label-free co-evolutionary framework designed to enhance code and test generation by large language models without ground-truth supervision. It addresses the bottleneck of costly human-curated unit tests in Reinforcement Learning with Verifiable Rewards (RLVR) by jointly training a Coder and a Tester. The framework uses execution feedback from self-generated code-test interactions, forming a passing matrix to identify consensus solutions and tests for reward derivation. ZeroCoder incorporates rank-based pre-filtering to remove low-information problems and a curriculum-based tester objective that balances validity and mutation-driven discriminativeness. Additionally, it introduces DyB4, a Bayesian selector that dynamically recalibrates its priors using as few as 10 labeled instances to counter "selector drift." On Qwen2.5-Coder-7B-Instruct, ZeroCoder improves code generation by up to 14.5% in a label-free setting and 21.6% with DyB4, with test generation improving by 24.3%, nearing oracle-supervised performance.

Key takeaway

For machine learning engineers aiming to improve LLM code generation in label-scarce environments, you should consider adopting co-evolutionary frameworks like ZeroCoder. This approach, which jointly trains code and test generators using self-generated execution feedback, significantly boosts performance without extensive ground-truth supervision. Implementing dynamic selector calibration, such as DyB4 with even 10 labeled instances, can further enhance robustness and achieve results competitive with oracle-supervised training.

Key insights

Co-evolving code and test generation with self-generated execution feedback significantly improves LLM performance without ground-truth supervision.

Principles

Method

ZeroCoder samples solutions and tests, executes them to form a passing matrix, applies a selector to identify consensus subsets, and derives role-specific rewards, incorporating rank-based pre-filtering and a curriculum for tester training.

In practice

Topics

Best for: Research Scientist, AI Engineer, AI Scientist, Machine Learning Engineer, NLP Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.SE updates on arXiv.org.