SWE-Explore: Benchmarking How Coding Agents Explore Repositories

· Source: cs.CL updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Software Development & Engineering · Depth: Advanced, extended

Summary

SWE-Explore is a new benchmark designed to isolate and evaluate the repository exploration capabilities of coding agents, a fine-grained aspect often obscured by holistic pass/fail metrics in existing benchmarks like SWE-bench. It challenges explorers to return a ranked list of relevant code regions for a given issue and repository, adhering to a fixed line budget. The benchmark encompasses 848 issues across 10 programming languages and 203 open-source repositories, with ground truth derived from successful agent trajectories. Evaluation focuses on coverage, ranking, and context-efficiency, with these metrics shown to strongly predict downstream repair success. Findings indicate that agentic explorers significantly outperform classical retrieval methods, excelling in file-level localization, but often remain recall-limited at the line level.

Key takeaway

For AI Scientists developing coding agents, understanding repository exploration as a distinct capability is crucial. This benchmark reveals that while agents excel at file-level localization, they often struggle with line-level recall. You should prioritize improving your agents' ability to surface precise, relevant code spans early in their ranked output, as missing critical context significantly impacts repair success more than moderate irrelevant information. Focus on enhancing line-level coverage and context efficiency to build more robust coding agents.

Key insights

SWE-Explore benchmarks coding agents' line-level repository exploration, isolating it from end-to-end repair outcomes.

Principles

Method

Explorers return ranked code regions for an issue and repository. These are scored against trajectory-derived ground truth using coverage, ranking, and context-efficiency metrics.

In practice

Topics

Code references

Best for: Research Scientist, AI Scientist, Machine Learning Engineer, AI Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CL updates on arXiv.org.