Efficient Black-Box Fault Localization for System-Level Test Code Using Large Language Models

· Source: cs.SE updates on arXiv.org · Field: Technology & Digital — Software Development & Engineering, Artificial Intelligence & Machine Learning · Depth: Expert, extended

Summary

A novel, fully static, LLM-driven approach for black-box fault localization in system test code (TCFL) is introduced, eliminating the need for repeated test executions. This method utilizes a single failure execution log to estimate the test's execution trace through three novel algorithms, identifying only code statements likely involved in the failure. The pruned trace, combined with the error message, prompts a Large Language Model to rank potential faulty locations. Operating at a system level without requiring access to the System Under Test (SUT) source code, the technique is applicable to large test scripts. Evaluation on an industrial dataset, not previously used in LLM pre-training, shows the best estimated trace achieves an F1 score of around 90%. Furthermore, pruning complex system test code reduces the LLM's inference time by up to 34% without performance degradation, with block-level TCFL offering a practical balance, achieving an 81% hit rate at top-3 (Hit@3).

Key takeaway

For software engineers and QA teams debugging complex system test failures, this LLM-driven black-box fault localization method offers a significant efficiency gain. You should consider adopting static execution trace estimation to prune test code, reducing LLM inference time by up to 34% and improving fault localization accuracy. Focus on block-level granularity for a practical balance, achieving an 81% Hit@3, to narrow your search space effectively without losing crucial context.

Key insights

LLMs can localize faults in complex system test code statically by estimating execution traces from single failure logs.

Principles

Method

A two-phase approach: statically estimate execution trace using log-to-source matching and CFG analysis, then prompt an LLM with the pruned code and error message for ranking-based fault localization.

In practice

Topics

Code references

Best for: AI Scientist, Research Scientist, Software Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.SE updates on arXiv.org.