From prediction sets to calibrated class scores

2026-03-03 · Source: Valeriy’s Substack · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics · Depth: Intermediate, short

Summary

Conformal classification, while known for producing prediction sets, also simultaneously generates calibrated p-values for every class, a critical but often overlooked aspect for tabular machine learning practitioners. Unlike scikit-learn's `predict_proba` outputs, which are not formally calibrated, conformal p-values offer finite-sample, distribution-free calibration guarantees under exchangeability. This means that for a true label `y`, the p-value `p_y(x)` falls below any threshold `t` with probability at most `t`. This dual output stems from the identical calibration procedure, where the prediction set `Cα(x)=y:py(x)>α` is simply a level-α cut of the conformal p-value vector. This vector of calibrated scores is highly valuable for downstream tabular models in applications like credit, fraud, or churn, where calibrated inputs are essential for accurate cost-sensitive decisions.

Key takeaway

For Machine Learning Engineers building tabular classifiers, understanding that conformal prediction provides formally calibrated p-values as a byproduct of prediction set generation is crucial. This capability offers a significant advantage over uncalibrated `predict_proba` outputs, ensuring downstream models receive reliable, calibrated scores for critical applications like fraud detection or ranking. Integrate conformal p-value vectors into your pipelines to improve the accuracy of cost-sensitive decisions and meta-models.

Key insights

Conformal classification simultaneously yields both prediction sets and formally calibrated p-values, resolving a critical gap in tabular ML.

Principles

Prediction sets are level-α cuts of conformal p-value vectors.
Conformal p-values are formally calibrated, unlike `predict_proba` outputs.

Method

Hold out a calibration set, compute nonconformity scores, then derive conformal p-values for each class, which are guaranteed to be calibrated and can be shipped as a vector.

In practice

Ship conformal p-value vectors to downstream tabular models.
Use hinge scores `1 - p_true(x_i)` for calibration in infrequent-event problems.

Topics

Conformal Prediction
Calibrated Class Scores
Prediction Sets
Tabular Machine Learning
Conformal p-values

Best for: Machine Learning Engineer, Data Scientist, MLOps Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Valeriy’s Substack.