iCoRe: An Iterative Correlation-Aware Retriever for Bug Reproduction Test Generation
Summary
iCoRe, an iterative correlation-aware context retrieval approach, significantly enhances the automatic generation of bug reproduction tests (BRT) from issue descriptions. It addresses key limitations in existing Large Language Model (LLM)-based retrieval methods by employing a differentiated strategy for source code and test cases, incorporating function call relationships beyond semantic similarity, and utilizing a feedback loop from the generation phase for iterative refinement. Evaluated with an LLM-based BRT generator on SWT-bench Lite and TDD-bench Verified benchmarks, iCoRe achieved Fail-to-Pass rates of 42.0% and 52.8% respectively. This represents a 19.7%–31.7% relative improvement over prior retrieval methods, outperforming state-of-the-art e-Otter++ with GPT-4o (40.2% and 51.4%) while being 6.4 times more cost-effective. It also provides more concise context, averaging approximately 2,500 tokens.
Key takeaway
For Machine Learning Engineers developing automated bug reproduction systems, you should consider adopting iCoRe's correlation-aware retrieval. Its differentiated strategy for code and tests, combined with iterative refinement and function call analysis, significantly improves BRT generation accuracy and cost-effectiveness. Integrating this generator-agnostic approach can lead to higher Fail-to-Pass rates and reduce expensive LLM calls, making your automated testing pipelines more robust and efficient.
Key insights
iCoRe enhances bug reproduction test generation via correlation-aware, iterative context retrieval, differentiating code/tests and integrating function call analysis.
Principles
- Production code and test case retrieval demand distinct strategies.
- Function call relationships reveal behavioral relevance beyond text.
- Iterative feedback from generation refines context retrieval.
Method
iCoRe employs a two-stage process: heuristic-based production code retrieval and iterative test code retrieval. This involves sketch BRT generation, textual/function-call similarity calculation, and LLM-based reranking.
In practice
- Differentiate retrieval for production code and test cases.
- Use function call analysis for behavioral code relevance.
- Implement generation-to-retrieval feedback loops.
Topics
- Bug Reproduction
- LLM-based Code Retrieval
- Software Testing Automation
- Function Call Graphs
- Iterative Context Refinement
- Test Generation
Code references
Best for: AI Engineer, Research Scientist, AI Scientist, Machine Learning Engineer, Software Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by cs.SE updates on arXiv.org.