Learning from Change: Predictive Models for Incident Prevention in a Regulated IT Environment
Summary
A study conducted at a large international bank, ING, presents a predictive incident risk scoring approach designed to prevent IT incidents caused by changes in highly regulated financial environments. The research compares the bank's existing rule-based risk assessment with three machine learning models: HGBC, LightGBM, and XGBoost, using a one-year dataset of 175,000 closed change tickets and linked priority 1 and 2 incidents. LightGBM emerged as the best-performing model, especially when enriched with aggregated team metrics like change success rates and incident counts. The approach emphasizes auditability and explainability, utilizing SHAP values to provide feature-level insights and ensure transparent, traceable decisions, which is critical for regulatory compliance under frameworks like DORA and the EU AI Act. This data-driven method significantly outperforms the baseline rule-based system in identifying high-risk changes, enabling proactive risk mitigation and enhancing IT operational reliability.
Key takeaway
For CTOs and VPs of Engineering managing IT change in regulated sectors, adopting data-driven ML models like LightGBM for incident prediction offers superior risk assessment compared to traditional rule-based systems. You should integrate explainable AI (XAI) techniques, such as SHAP, into your change management workflows to ensure auditability and build trust. This enables proactive identification of high-risk changes, allowing your teams to apply targeted scrutiny and preventive actions, thereby enhancing operational resilience and compliance with regulations like DORA and the EU AI Act.
Key insights
Interpretable ML models can effectively predict IT incident risk from changes in regulated environments, outperforming rule-based systems.
Principles
- Explainability is crucial for regulatory compliance and user trust.
- Aggregated team metrics enhance predictive model accuracy.
- High recall is prioritized over precision in incident prevention.
Method
Train boosted tree classifiers (LightGBM) on historical change and incident data, enriched with aggregated team metrics. Use SHAP for feature-level interpretability to provide incident prediction scores for planned IT changes.
In practice
- Use SHAP to explain model predictions to engineers.
- Incorporate team performance metrics into risk models.
- Prioritize recall to identify most incident-inducing changes.
Topics
- Predictive Incident Prevention
- IT Change Management
- Explainable AI
- LightGBM Model
- Financial Sector Regulation
Best for: CTO, VP of Engineering/Data, Director of AI/ML, Machine Learning Engineer, MLOps Engineer, Research Scientist
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.