Probe-and-Refine Tuning of Repository Guidance for Coding Agents

· Source: cs.SE updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Software Development & Engineering · Depth: Expert, extended

Summary

Probe-and-refine tuning is a novel procedure designed to enhance LLM-based coding agents by iteratively refining repository guidance files. This method uses synthetic bug-fix probes to diagnose and patch guidance through single-shot LLM calls, without agent loops or tool use during tuning. Evaluated on SWE-bench Verified across four trials with Qwen3.5-35B-A3B at 200 steps, probe-and-refine achieved a 33.0% mean resolve rate, significantly surpassing a static knowledge base (28.3%) and an unguided baseline (25.5%) ($p<0.001$). The performance gain is attributed to a 14.5 percentage point increase in evaluable patch coverage, with per-patch precision remaining constant at approximately 59% ($p=0.119$). The research also highlights that effective guidance allows agents to productively utilize larger step budgets, and that the tuning process and guidance transfer fail with capacity-constrained models like NVIDIA-Nemotron-3-Nano-30B-A3B.

Key takeaway

For AI Engineers aiming to improve LLM-based coding agent reliability, you should prioritize iteratively refining repository guidance. Deploying probe-and-refine tuning can significantly increase an agent's ability to produce evaluable patches, especially for localization-dependent fixes. Ensure your agent's step budget aligns with the guidance's workflow complexity, and always tune guidance using the specific model that will execute it, as cross-model transfer can degrade performance.

Key insights

Iteratively refined, failure-informed guidance significantly improves LLM coding agent performance by boosting patch coverage.

Principles

Method

Probe-and-refine tuning involves generating synthetic bug-fix probes, attempting solutions, judging attempts, and aggregating diagnostics to mechanically edit repository guidance files via single-shot LLM calls.

In practice

Topics

Code references

Best for: Research Scientist, AI Scientist, Machine Learning Engineer, AI Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.SE updates on arXiv.org.