iCoRe: An Iterative Correlation-Aware Retriever for Bug Reproduction Test Generation

· Source: cs.SE updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Software Development & Engineering · Depth: Expert, extended

Summary

iCoRe, an iterative correlation-aware context retrieval approach, significantly enhances the automatic generation of bug reproduction tests (BRT) from issue descriptions. It addresses key limitations in existing Large Language Model (LLM)-based retrieval methods by employing a differentiated strategy for source code and test cases, incorporating function call relationships beyond semantic similarity, and utilizing a feedback loop from the generation phase for iterative refinement. Evaluated with an LLM-based BRT generator on SWT-bench Lite and TDD-bench Verified benchmarks, iCoRe achieved Fail-to-Pass rates of 42.0% and 52.8% respectively. This represents a 19.7%–31.7% relative improvement over prior retrieval methods, outperforming state-of-the-art e-Otter++ with GPT-4o (40.2% and 51.4%) while being 6.4 times more cost-effective. It also provides more concise context, averaging approximately 2,500 tokens.

Key takeaway

For Machine Learning Engineers developing automated bug reproduction systems, you should consider adopting iCoRe's correlation-aware retrieval. Its differentiated strategy for code and tests, combined with iterative refinement and function call analysis, significantly improves BRT generation accuracy and cost-effectiveness. Integrating this generator-agnostic approach can lead to higher Fail-to-Pass rates and reduce expensive LLM calls, making your automated testing pipelines more robust and efficient.

Key insights

iCoRe enhances bug reproduction test generation via correlation-aware, iterative context retrieval, differentiating code/tests and integrating function call analysis.

Principles

Method

iCoRe employs a two-stage process: heuristic-based production code retrieval and iterative test code retrieval. This involves sketch BRT generation, textual/function-call similarity calculation, and LLM-based reranking.

In practice

Topics

Code references

Best for: AI Engineer, Research Scientist, AI Scientist, Machine Learning Engineer, Software Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.SE updates on arXiv.org.