A 0.91 confidence score told me the plate was right. It wasn't.

2026-06-17 · Source: AI Advances - Medium · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Software Development & Engineering · Depth: Advanced, long

Summary

A vision model for license plate OCR exhibited systematic overconfidence, misreading "QJG659" as "OJG659" with "0.91" confidence due to visual ambiguity. This article addresses two core issues: a model's confidence score is often not a true probability, especially on ambiguous inputs like "O"/"Q" pairs, and even a corrected score doesn't dictate a decision. Solutions involve calibrating scores using methods like temperature scaling or Platt scaling for global adjustments, or a confusability matrix for per-glyph corrections, which can be derived from fonts or error logs. The author stresses that calibration is an ongoing process, requiring continuous logging and refitting due to data drift. Crucially, a calibrated score must inform a distinct decision policy with options to accept, reject, or abstain, considering the asymmetric costs of different error types, leading to per-class thresholds and an explicit abstention band.

Key takeaway

For MLOps Engineers deploying vision models where confident misreads are costly, you must treat model confidence as a raw signal, not a true probability. Calibrate your model's scores continuously using techniques like temperature scaling or a confusability matrix. Crucially, separate "how sure" from "what to do," designing a decision policy with explicit accept, reject, and abstain outcomes. Set per-class thresholds based on the asymmetric costs of being wrong to prevent silent, expensive errors.

Key insights

Model confidence scores are not probabilities; calibrate them and separate "how sure" from "what to do" based on error costs.

Principles

Model confidence is a raw signal, not a true probability.
Calibration requires continuous maintenance against drift.
Decision policies must account for asymmetric error costs.

Method

Calibrate model confidence using temperature/Platt scaling or a confusability matrix (font-derived or error-log learned). Define a decision policy with accept, reject, and abstain outcomes, setting per-class thresholds based on asymmetric error costs.

In practice

Implement temperature or Platt scaling for global calibration.
Define explicit abstain bands for human review.

Topics

Model Calibration
Confidence Scores
Decision Policies
Computer Vision
License Plate Recognition
Error Cost Analysis

Best for: AI Engineer, Machine Learning Engineer, MLOps Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by AI Advances - Medium.