Expert-Guided Class-Conditional Goodness-of-Fit Scores for Interpretable Classification with Informative Missingness: An Application to Seismic Monitoring
Summary
This research introduces an interpretable classification framework designed to address pervasive informative missingness and integrate partial expert knowledge, specifically applied to seismic monitoring for the Comprehensive Nuclear-Test-Ban Treaty (CTBT). The framework encodes prior knowledge through an expert-guided class-conditional model for one or more classes, using this to construct interpretable goodness-of-fit features. These features quantify data agreement with the expert model, isolating contributions from observed and missing components. Combined with transparent auxiliary summaries, these features feed into a simple discriminative classifier, yielding an easily inspectable decision rule. The method demonstrates strong potential as a transparent screening tool, reducing expert analyst workload. Simulations indicate that this interpretable, expert-guided approach can outperform standard machine learning classifiers, particularly with small training samples, achieving an AUROC of 0.925 and AUPRC of 0.907 in the CTBT application, significantly improving over a baseline logistic regression model's AUROC of 0.865 and AUPRC of 0.856.
Key takeaway
For AI Scientists and Machine Learning Engineers developing classification systems in high-stakes domains with incomplete data, consider integrating domain expertise through expert-guided class-conditional models. Your team should prioritize constructing interpretable goodness-of-fit features, especially those that explicitly account for informative missingness patterns. This approach can yield transparent, justifiable decision rules that are competitive with, or even superior to, black-box models, particularly when training data is limited, while also reducing expert review burdens.
Key insights
Expert-guided, interpretable features derived from class-conditional models improve classification, especially with informative missingness.
Principles
- Informative missingness is a key signal, not a nuisance.
- Partial expert knowledge can be encoded into interpretable features.
- Hybrid generative-discriminative models enhance transparency and performance.
Method
The proposed pipeline involves class-specific fitting of instance-level parameters, expert-guided feature engineering (model-fit scores, auxiliary summaries), and transparent classification on the augmented representation.
In practice
- Decompose likelihoods into detection, non-detection, and observed-value components.
- Normalize score features to ensure comparability across instances.
- Use logistic regression for final classification to maintain interpretability.
Topics
- Informative Missingness
- Interpretable Machine Learning
- Expert-Guided Models
- Goodness-of-Fit Scores
- Seismic Monitoring
Code references
Best for: AI Scientist, Machine Learning Engineer, Research Scientist
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by stat.ML updates on arXiv.org.