Probe-and-Refine Tuning of Repository Guidance for Coding Agents
Summary
Probe-and-Refine Tuning is a new procedure designed to enhance the performance of LLM-based coding agents by iteratively improving their repository guidance files. This method uses synthetic bug-fix probes to diagnose and patch an AGENTS.md file through single-shot LLM calls, without requiring an agent loop or tool use during the tuning process. The research addresses the debate on whether LLM-generated guidance helps or harms agent performance, concluding that the guidance production method is crucial. On SWE-bench Verified, across four trials with Qwen3.5-35B-A3B at 200 steps, probe-and-refine achieved a 33.0% mean resolve rate. This significantly outperforms the 28.3% for static knowledge bases and 25.5% for unguided baselines (p < 0.001). The improvement stems from increased coverage, with refined guidance producing evaluable patches for 14.5 percentage points more instances, while per-patch precision remained constant at approximately 59% (p = 0.119). Further experiments indicate that guidance enables agents to productively utilize larger step budgets and that tuning effectiveness degrades with models unable to generate sufficiently diagnostic output, though precision holds.
Key takeaway
For AI Engineers developing LLM-based coding agents, you should prioritize implementing and iteratively refining repository guidance files like AGENTS.md. This approach, specifically using a probe-and-refine tuning method, can significantly increase your agent's bug-fix resolve rate by improving its ability to locate correct files. Ensure your chosen LLM can generate diagnostic output for effective guidance refinement, as this directly impacts the tuning loop's success and your agent's productive use of larger step budgets.
Key insights
Iterative, synthetic bug-fix probing significantly refines LLM coding agent guidance, boosting resolve rates by improving coverage.
Principles
- Guidance quality is key for coding agent performance.
- Coverage, not precision, drives guidance-based improvements.
- Diagnostic LLM output is vital for effective tuning loops.
Method
Probe-and-refine tuning iteratively diagnoses and patches repository guidance files using synthetic bug-fix probes and single-shot LLM calls, without agent loops.
In practice
- Implement AGENTS.md files for LLM coding agents.
- Focus guidance on improving file reach, not just code quality.
- Use diagnostic LLMs for guidance refinement processes.
Topics
- LLM Coding Agents
- Repository Guidance
- Probe-and-Refine Tuning
- SWE-bench Verified
- Qwen3.5-35B-A3B
- Software Engineering
Best for: Research Scientist, AI Scientist, Machine Learning Engineer, AI Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Machine Learning.