On the Reproducibility of Quantum Software Defect Datasets: A Case Study of Bugs4Q
Summary
A study investigated the reproducibility of Bugs4Q, a widely used quantum software defect dataset, across 77,700 quantum program executions of 37 artifacts over 21 Qiskit core-library versions spanning three years. The research found that Bugs4Q's reproducibility sharply declined from 62.2% on Qiskit v0.20.1 to 16.2% on v2.3.1, the latest version as of April 1, 2026, with 83.8% of artifacts experiencing reproduction failures at least once. Manual analysis of 543 failures revealed that 93.6% were dependency-related, consistent with classical software defect datasets. However, a key difference emerged: only 4.6% of Bugs4Q's failures could be resolved by dependency updates alone, with the majority requiring source-code modifications. Based on these findings, the researchers curated Bugs4Q-Robust, a patched version that increased reproducibility to 78.4% on Qiskit v2.3.1, demonstrating the need for continuous source-level maintenance in evolving quantum ecosystems.
Key takeaway
For research scientists and software engineers relying on quantum defect datasets like Bugs4Q, you must account for rapid API evolution. Your studies should document the exact Qiskit version and environment used, as reproducibility significantly degrades over time, often requiring source-code modifications, not just dependency pinning. Consider contributing to or utilizing continuously maintained, patched datasets like Bugs4Q-Robust to ensure the validity and comparability of your results.
Key insights
Quantum software defect datasets face severe reproducibility degradation due to rapid ecosystem evolution, often requiring source-level patches.
Principles
- Reproducibility degrades over time due to external dependencies.
- Dependency-related issues dominate defect dataset failures.
- One-shot patches are insufficient for evolving quantum frameworks.
Method
The study conducted an operational replication using Bugs4Q, analyzing 77,700 executions across 21 Qiskit versions, classifying 543 failures, and curating a patched dataset, Bugs4Q-Robust.
In practice
- Document dataset snapshot and execution environment.
- Implement continuous source-level maintenance for quantum datasets.
- Develop automated code-migration tools for API evolution.
Topics
- Quantum Software Engineering
- Software Defect Datasets
- Bugs4Q
- Qiskit
- Reproducibility
- Dependency Management
Code references
Best for: AI Scientist, Research Scientist, Software Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by cs.SE updates on arXiv.org.