The ninety-percent confidence that means nothing — and what does
Summary
The article introduces conformal prediction as a method to address the misrepresentation of confidence by machine learning classifiers, particularly the outputs of `softmax` or `predict_proba`. These outputs, often mistaken for true probabilities, do not guarantee that a model's stated confidence (e.g., 0.92) will align with actual accuracy across many predictions. Conformal prediction, in contrast, provides a formal, finite-sample guarantee that a returned set of labels will contain the true label at a specified coverage rate (e.g., 90%). The inductive version of this method involves four steps: splitting data into training and calibration sets, scoring calibration points with a nonconformity measure, thresholding these scores to determine a quantile, and constructing prediction sets for new inputs. This technique is model-agnostic and offers a robust alternative to relying on miscalibrated softmax outputs for critical applications.
Key takeaway
For AI Engineers deploying classifiers in critical applications, relying solely on `predict_proba` for confidence is unreliable. You should implement conformal prediction to obtain formally guaranteed prediction sets or calibrated probability intervals. This ensures that your model's uncertainty is accurately communicated, enabling more robust decision-making, better triage rules, and auditable outputs for regulatory compliance.
Key insights
Softmax outputs are not true probabilities and misrepresent classifier confidence; conformal prediction offers reliable coverage guarantees.
Principles
- Softmax outputs are not probabilities.
- Conformal prediction offers finite-sample validity.
- Exchangeability is the sole assumption.
Method
The inductive conformal prediction method involves splitting data, scoring calibration points with a nonconformity measure, thresholding these scores to find a quantile, and constructing prediction sets for new inputs.
In practice
- Bin `predict_proba` outputs to check empirical accuracy.
- Deploy prediction sets for candidate labels with reliability.
- Use Venn-Abers for calibrated probability intervals.
Topics
- Conformal Prediction
- Prediction Sets
- Model Calibration
- Softmax Output
- Venn-Abers Predictors
Best for: Machine Learning Engineer, MLOps Engineer, AI Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Valeriy’s Substack.