On the Reproducibility of Quantum Software Defect Datasets: A Case Study of Bugs4Q

2026-06-26 · Source: cs.SE updates on arXiv.org · Field: Technology & Digital — Software Development & Engineering, Artificial Intelligence & Machine Learning, Emerging Technologies & Innovation · Depth: Expert, extended

Summary

A study investigated the reproducibility of Bugs4Q, a widely used quantum software defect dataset, across 77,700 quantum program executions of 37 artifacts over 21 Qiskit core-library versions spanning three years. The research found that Bugs4Q's reproducibility sharply declined from 62.2% on Qiskit v0.20.1 to 16.2% on v2.3.1, the latest version as of April 1, 2026, with 83.8% of artifacts experiencing reproduction failures at least once. Manual analysis of 543 failures revealed that 93.6% were dependency-related, consistent with classical software defect datasets. However, a key difference emerged: only 4.6% of Bugs4Q's failures could be resolved by dependency updates alone, with the majority requiring source-code modifications. Based on these findings, the researchers curated Bugs4Q-Robust, a patched version that increased reproducibility to 78.4% on Qiskit v2.3.1, demonstrating the need for continuous source-level maintenance in evolving quantum ecosystems.

Key takeaway

For research scientists and software engineers relying on quantum defect datasets like Bugs4Q, you must account for rapid API evolution. Your studies should document the exact Qiskit version and environment used, as reproducibility significantly degrades over time, often requiring source-code modifications, not just dependency pinning. Consider contributing to or utilizing continuously maintained, patched datasets like Bugs4Q-Robust to ensure the validity and comparability of your results.

Key insights

Quantum software defect datasets face severe reproducibility degradation due to rapid ecosystem evolution, often requiring source-level patches.

Principles

Reproducibility degrades over time due to external dependencies.
Dependency-related issues dominate defect dataset failures.
One-shot patches are insufficient for evolving quantum frameworks.

Method

The study conducted an operational replication using Bugs4Q, analyzing 77,700 executions across 21 Qiskit versions, classifying 543 failures, and curating a patched dataset, Bugs4Q-Robust.

In practice

Document dataset snapshot and execution environment.
Implement continuous source-level maintenance for quantum datasets.
Develop automated code-migration tools for API evolution.

Topics

Quantum Software Engineering
Software Defect Datasets
Bugs4Q
Qiskit
Reproducibility
Dependency Management

Code references

Best for: AI Scientist, Research Scientist, Software Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.SE updates on arXiv.org.