Automated Root-Cause Subclassification and No-Code Fix Generation for Invalid Bug Reports

2025-11-07 · Source: cs.MA updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Software Development & Engineering · Depth: Expert, extended

Summary

A study by Mahmut Furkan Gön et al. introduces and evaluates an automated framework, IssueSupport, for subclassifying invalid bug reports and generating no-code fixes. The research establishes a standardized taxonomy for root-cause-oriented invalid bug report subclassification, including categories like "External System & Dependency Issues" and "Faulty Configuration." Using a manually curated benchmark from the Brave browser repository, the authors experimented with vanilla LLMs, Retrieval Augmented Generation (RAG), and agentic web search. For subclassification, RAG achieved the highest overall weighted F1-score of 0.66, slightly outperforming vanilla LLMs (0.65) and agentic web search (0.64). For no-code fix generation, agentic web search with Gemini 3.1 Pro achieved the highest overall Judge LLM success rate at 68.9%, surpassing RAG (64.4%) and vanilla LLMs (64.9%). The study highlights that different approaches are more effective for distinct tasks, with RAG excelling in subclassification and agentic web search in fix generation.

Key takeaway

For AI Engineers developing bug triage systems, consider a hybrid approach: initially ground your LLM with project-specific RAG for robust invalid bug report subclassification, then integrate agentic web search for generating actionable no-code fixes. This strategy optimizes for both accurate categorization and effective resolution, particularly for complex issues like "Wrong Version" or "Faulty Configuration," which benefit from dynamic external context. Always validate generated fixes using a Judge LLM to ensure functional correctness, rather than relying solely on semantic similarity.

Key insights

Automating invalid bug report subclassification and no-code fix generation significantly reduces software maintenance overhead.

Principles

Context engineering improves LLM performance.
Different LLM approaches suit different tasks.
Semantic similarity metrics can be misleading.

Method

The IssueSupport framework uses LLMs, RAG, and agentic web search to classify invalid bug reports by root cause and generate no-code fixes, evaluated via F1-Score, BERTScore, and Judge LLM success rates.

In practice

Use RAG for bug report subclassification.
Employ agentic web search for no-code fix generation.
Prioritize Judge LLM evaluation over BERTScore.

Topics

Invalid Bug Reports
Root Cause Subclassification
No-Code Fix Generation
Large Language Models
Retrieval-Augmented Generation

Code references

Best for: Research Scientist, AI Engineer, NLP Engineer, AI Scientist, Machine Learning Engineer, Software Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.MA updates on arXiv.org.