The ninety-percent confidence that means nothing — and what does

2026-03-03 · Source: Valeriy’s Substack · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics · Depth: Advanced, medium

Summary

The article introduces conformal prediction as a method to address the misrepresentation of confidence by machine learning classifiers, particularly the outputs of `softmax` or `predict_proba`. These outputs, often mistaken for true probabilities, do not guarantee that a model's stated confidence (e.g., 0.92) will align with actual accuracy across many predictions. Conformal prediction, in contrast, provides a formal, finite-sample guarantee that a returned set of labels will contain the true label at a specified coverage rate (e.g., 90%). The inductive version of this method involves four steps: splitting data into training and calibration sets, scoring calibration points with a nonconformity measure, thresholding these scores to determine a quantile, and constructing prediction sets for new inputs. This technique is model-agnostic and offers a robust alternative to relying on miscalibrated softmax outputs for critical applications.

Key takeaway

For AI Engineers deploying classifiers in critical applications, relying solely on `predict_proba` for confidence is unreliable. You should implement conformal prediction to obtain formally guaranteed prediction sets or calibrated probability intervals. This ensures that your model's uncertainty is accurately communicated, enabling more robust decision-making, better triage rules, and auditable outputs for regulatory compliance.

Key insights

Softmax outputs are not true probabilities and misrepresent classifier confidence; conformal prediction offers reliable coverage guarantees.

Principles

Softmax outputs are not probabilities.
Conformal prediction offers finite-sample validity.
Exchangeability is the sole assumption.

Method

The inductive conformal prediction method involves splitting data, scoring calibration points with a nonconformity measure, thresholding these scores to find a quantile, and constructing prediction sets for new inputs.

In practice

Bin `predict_proba` outputs to check empirical accuracy.
Deploy prediction sets for candidate labels with reliability.
Use Venn-Abers for calibrated probability intervals.

Topics

Conformal Prediction
Prediction Sets
Model Calibration
Softmax Output
Venn-Abers Predictors

Best for: Machine Learning Engineer, MLOps Engineer, AI Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Valeriy’s Substack.