Exploring Statistical Change Point Detection Techniques for Performance Anomaly Detection at Mozilla

· Source: cs.SE updates on arXiv.org · Field: Technology & Digital — Software Development & Engineering, Artificial Intelligence & Machine Learning · Depth: Advanced, extended

Summary

An empirical study at Mozilla evaluated 25 change-point detection (CPD) methods and 15 ensemble approaches to improve performance anomaly detection within its Perfherder system. Mozilla's current Student's T-test-based method generates 12.5% false positives and misses approximately 6.8% of regressions. Researchers constructed a ground-truth dataset of 174 performance time series, manually annotated by eleven Mozilla engineers, for benchmarking. Results indicate that while offline and hybrid CPD methods enhance recall, they significantly reduce precision. However, ensemble voting strategies mitigate this trade-off, achieving an 11% improvement in F1-score and offering more consistent performance. The study validates these findings through a practitioner survey, providing insights for integrating superior methods into Mozilla's performance engineering workflow.

Key takeaway

For MLOps Engineers managing continuous integration performance monitoring, you should move beyond simple statistical tests like the Student's T-test. Explore ensemble change-point detection methods to significantly reduce false positives and missed regressions, thereby improving your system's F1-score and overall reliability. Prioritize validating new detection systems using practitioner-annotated datasets to ensure real-world applicability and trust.

Key insights

Evaluating diverse change-point detection methods and ensembles can significantly improve performance anomaly detection accuracy over traditional statistical tests.

Principles

Method

Construct a practitioner-annotated ground-truth dataset, evaluate diverse CPD methods (offline, online, hybrid), and assess ensemble voting strategies for performance anomaly detection.

In practice

Topics

Code references

Best for: Research Scientist, AI Scientist, Machine Learning Engineer, MLOps Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.SE updates on arXiv.org.