Probe-and-Refine Tuning of Repository Guidance for Coding Agents

2026-06-18 · Source: Machine Learning · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Software Development & Engineering · Depth: Expert, quick

Summary

Probe-and-Refine Tuning is a new procedure designed to enhance the performance of LLM-based coding agents by iteratively improving their repository guidance files. This method uses synthetic bug-fix probes to diagnose and patch an AGENTS.md file through single-shot LLM calls, without requiring an agent loop or tool use during the tuning process. The research addresses the debate on whether LLM-generated guidance helps or harms agent performance, concluding that the guidance production method is crucial. On SWE-bench Verified, across four trials with Qwen3.5-35B-A3B at 200 steps, probe-and-refine achieved a 33.0% mean resolve rate. This significantly outperforms the 28.3% for static knowledge bases and 25.5% for unguided baselines (p < 0.001). The improvement stems from increased coverage, with refined guidance producing evaluable patches for 14.5 percentage points more instances, while per-patch precision remained constant at approximately 59% (p = 0.119). Further experiments indicate that guidance enables agents to productively utilize larger step budgets and that tuning effectiveness degrades with models unable to generate sufficiently diagnostic output, though precision holds.

Key takeaway

For AI Engineers developing LLM-based coding agents, you should prioritize implementing and iteratively refining repository guidance files like AGENTS.md. This approach, specifically using a probe-and-refine tuning method, can significantly increase your agent's bug-fix resolve rate by improving its ability to locate correct files. Ensure your chosen LLM can generate diagnostic output for effective guidance refinement, as this directly impacts the tuning loop's success and your agent's productive use of larger step budgets.

Key insights

Iterative, synthetic bug-fix probing significantly refines LLM coding agent guidance, boosting resolve rates by improving coverage.

Principles

Guidance quality is key for coding agent performance.
Coverage, not precision, drives guidance-based improvements.
Diagnostic LLM output is vital for effective tuning loops.

Method

Probe-and-refine tuning iteratively diagnoses and patches repository guidance files using synthetic bug-fix probes and single-shot LLM calls, without agent loops.

In practice

Implement AGENTS.md files for LLM coding agents.
Focus guidance on improving file reach, not just code quality.
Use diagnostic LLMs for guidance refinement processes.

Topics

LLM Coding Agents
Repository Guidance
Probe-and-Refine Tuning
SWE-bench Verified
Qwen3.5-35B-A3B
Software Engineering

Best for: Research Scientist, AI Scientist, Machine Learning Engineer, AI Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Machine Learning.