Efficient Black-Box Fault Localization for System-Level Test Code Using Large Language Models

2026-06-30 · Source: cs.SE updates on arXiv.org · Field: Technology & Digital — Software Development & Engineering, Artificial Intelligence & Machine Learning · Depth: Expert, extended

Summary

A novel, fully static, LLM-driven approach for black-box fault localization in system test code (TCFL) is introduced, eliminating the need for repeated test executions. This method utilizes a single failure execution log to estimate the test's execution trace through three novel algorithms, identifying only code statements likely involved in the failure. The pruned trace, combined with the error message, prompts a Large Language Model to rank potential faulty locations. Operating at a system level without requiring access to the System Under Test (SUT) source code, the technique is applicable to large test scripts. Evaluation on an industrial dataset, not previously used in LLM pre-training, shows the best estimated trace achieves an F1 score of around 90%. Furthermore, pruning complex system test code reduces the LLM's inference time by up to 34% without performance degradation, with block-level TCFL offering a practical balance, achieving an 81% hit rate at top-3 (Hit@3).

Key takeaway

For software engineers and QA teams debugging complex system test failures, this LLM-driven black-box fault localization method offers a significant efficiency gain. You should consider adopting static execution trace estimation to prune test code, reducing LLM inference time by up to 34% and improving fault localization accuracy. Focus on block-level granularity for a practical balance, achieving an 81% Hit@3, to narrow your search space effectively without losing crucial context.

Key insights

LLMs can localize faults in complex system test code statically by estimating execution traces from single failure logs.

Principles

Execution trace estimation improves LLM fault localization.
Pruning input reduces LLM inference time and cost.
Block-level granularity balances context and search space.

Method

A two-phase approach: statically estimate execution trace using log-to-source matching and CFG analysis, then prompt an LLM with the pruned code and error message for ranking-based fault localization.

In practice

Use static analysis to estimate execution traces.
Combine line-level and CFG-based estimation for accuracy.
Prioritize block-level fault localization for efficiency.

Topics

Fault Localization
Large Language Models
Test Code Analysis
Execution Trace Estimation
Black-Box Testing
Software Debugging

Code references

joernio/joern

Best for: AI Scientist, Research Scientist, Software Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.SE updates on arXiv.org.