Probe-and-Refine Tuning of Repository Guidance for Coding Agents

2026-06-19 · Source: cs.SE updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Software Development & Engineering · Depth: Expert, extended

Summary

Probe-and-refine tuning is a novel procedure designed to enhance LLM-based coding agents by iteratively refining repository guidance files. This method uses synthetic bug-fix probes to diagnose and patch guidance through single-shot LLM calls, without agent loops or tool use during tuning. Evaluated on SWE-bench Verified across four trials with Qwen3.5-35B-A3B at 200 steps, probe-and-refine achieved a 33.0% mean resolve rate, significantly surpassing a static knowledge base (28.3%) and an unguided baseline (25.5%) ($p<0.001$). The performance gain is attributed to a 14.5 percentage point increase in evaluable patch coverage, with per-patch precision remaining constant at approximately 59% ($p=0.119$). The research also highlights that effective guidance allows agents to productively utilize larger step budgets, and that the tuning process and guidance transfer fail with capacity-constrained models like NVIDIA-Nemotron-3-Nano-30B-A3B.

Key takeaway

For AI Engineers aiming to improve LLM-based coding agent reliability, you should prioritize iteratively refining repository guidance. Deploying probe-and-refine tuning can significantly increase an agent's ability to produce evaluable patches, especially for localization-dependent fixes. Ensure your agent's step budget aligns with the guidance's workflow complexity, and always tune guidance using the specific model that will execute it, as cross-model transfer can degrade performance.

Key insights

Iteratively refined, failure-informed guidance significantly improves LLM coding agent performance by boosting patch coverage.

Principles

Instruction quality is a first-order determinant for coding agent reliability.
Guidance converts additional agent steps into productive work.
Guidance artifacts encode model-specific behavioral calibration.

Method

Probe-and-refine tuning involves generating synthetic bug-fix probes, attempting solutions, judging attempts, and aggregating diagnostics to mechanically edit repository guidance files via single-shot LLM calls.

In practice

Use synthetic probes to iteratively refine repository guidance.
Match agent step budget to guidance workflow complexity.
Tune guidance with the specific LLM that will consume it.

Topics

LLM Coding Agents
Repository Guidance
Probe-and-Refine Tuning
SWE-bench Verified
Qwen3.5-35B-A3B
Model Calibration

Code references

asashepard/probe-and-refine-tuning

Best for: Research Scientist, AI Scientist, Machine Learning Engineer, AI Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.SE updates on arXiv.org.