PracRepair: LLM-Empowered Automated Program Repair Inspired by Human-Like Debugging Practices
Summary
PracRepair is a novel LLM-based automated program repair (APR) framework. It mimics human debugging practices, addressing limitations of existing LLM-based APR methods that underutilize dynamic information. Current approaches often rely on static context, error messages, and coarse validation, overlooking crucial failure-execution and patch-validation dynamics. PracRepair constructs an on-demand static-dynamic context from buggy programs and failure executions. It then employs question-driven failure diagnosis to generate explicit repair hypotheses. Patches are iteratively refined using validation diagnostics and trace-level behavioral changes. Experimental evaluations on Defects4J V1.2 and V2.0 demonstrate its superior performance. With GPT-3.5, PracRepair fixed 139 bugs on V1.2 and 136 on V2.0. Using GPT-4o, these numbers improved to 162 and 171 bugs, respectively. The framework also generalizes effectively to Real-World Bugs (RWB), achieving top performance across various foundation models.
Key takeaway
For Machine Learning Engineers developing automated program repair solutions, integrate dynamic execution and validation feedback into your LLM-based frameworks. PracRepair demonstrates that utilizing on-demand static-dynamic context and iterative patch refinement significantly boosts bug-fixing capabilities. This human-inspired debugging approach improves performance on Defects4J and real-world bugs, suggesting a path to more robust and effective repair systems.
Key insights
PracRepair enhances LLM-based automated program repair by integrating dynamic execution and validation feedback, mimicking human debugging.
Principles
- Human-like debugging improves LLM-based APR.
- Dynamic execution context is crucial for failure diagnosis.
- Iterative patch refinement uses trace-level changes.
Method
PracRepair constructs on-demand static-dynamic context, performs question-driven failure diagnosis for hypotheses, and iteratively refines patches using validation diagnostics and trace-level behavioral changes.
In practice
- Integrate dynamic traces for bug localization.
- Use question-driven diagnosis for repair hypotheses.
- Refine patches with validation feedback.
Topics
- Automated Program Repair
- Large Language Models
- Dynamic Debugging
- Software Testing
- Defects4J
- GPT-4o
Best for: AI Scientist, Machine Learning Engineer, Software Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by cs.SE updates on arXiv.org.