Expert-Guided Class-Conditional Goodness-of-Fit Scores for Interpretable Classification with Informative Missingness: An Application to Seismic Monitoring

2026-04-17 · Source: stat.ML updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Environmental Science & Earth Systems · Depth: Expert, extended

Summary

This research introduces an interpretable classification framework designed to address pervasive informative missingness and integrate partial expert knowledge, specifically applied to seismic monitoring for the Comprehensive Nuclear-Test-Ban Treaty (CTBT). The framework encodes prior knowledge through an expert-guided class-conditional model for one or more classes, using this to construct interpretable goodness-of-fit features. These features quantify data agreement with the expert model, isolating contributions from observed and missing components. Combined with transparent auxiliary summaries, these features feed into a simple discriminative classifier, yielding an easily inspectable decision rule. The method demonstrates strong potential as a transparent screening tool, reducing expert analyst workload. Simulations indicate that this interpretable, expert-guided approach can outperform standard machine learning classifiers, particularly with small training samples, achieving an AUROC of 0.925 and AUPRC of 0.907 in the CTBT application, significantly improving over a baseline logistic regression model's AUROC of 0.865 and AUPRC of 0.856.

Key takeaway

For AI Scientists and Machine Learning Engineers developing classification systems in high-stakes domains with incomplete data, consider integrating domain expertise through expert-guided class-conditional models. Your team should prioritize constructing interpretable goodness-of-fit features, especially those that explicitly account for informative missingness patterns. This approach can yield transparent, justifiable decision rules that are competitive with, or even superior to, black-box models, particularly when training data is limited, while also reducing expert review burdens.

Key insights

Expert-guided, interpretable features derived from class-conditional models improve classification, especially with informative missingness.

Principles

Informative missingness is a key signal, not a nuisance.
Partial expert knowledge can be encoded into interpretable features.
Hybrid generative-discriminative models enhance transparency and performance.

Method

The proposed pipeline involves class-specific fitting of instance-level parameters, expert-guided feature engineering (model-fit scores, auxiliary summaries), and transparent classification on the augmented representation.

In practice

Decompose likelihoods into detection, non-detection, and observed-value components.
Normalize score features to ensure comparability across instances.
Use logistic regression for final classification to maintain interpretability.

Topics

Informative Missingness
Interpretable Machine Learning
Expert-Guided Models
Goodness-of-Fit Scores
Seismic Monitoring

Code references

Cohen-Shahar/class-conditional-gof

Best for: AI Scientist, Machine Learning Engineer, Research Scientist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by stat.ML updates on arXiv.org.