A 0.91 confidence score told me the plate was right. It wasn't.

· Source: AI Advances - Medium · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Software Development & Engineering · Depth: Advanced, long

Summary

A vision model for license plate OCR exhibited systematic overconfidence, misreading "QJG659" as "OJG659" with "0.91" confidence due to visual ambiguity. This article addresses two core issues: a model's confidence score is often not a true probability, especially on ambiguous inputs like "O"/"Q" pairs, and even a corrected score doesn't dictate a decision. Solutions involve calibrating scores using methods like temperature scaling or Platt scaling for global adjustments, or a confusability matrix for per-glyph corrections, which can be derived from fonts or error logs. The author stresses that calibration is an ongoing process, requiring continuous logging and refitting due to data drift. Crucially, a calibrated score must inform a distinct decision policy with options to accept, reject, or abstain, considering the asymmetric costs of different error types, leading to per-class thresholds and an explicit abstention band.

Key takeaway

For MLOps Engineers deploying vision models where confident misreads are costly, you must treat model confidence as a raw signal, not a true probability. Calibrate your model's scores continuously using techniques like temperature scaling or a confusability matrix. Crucially, separate "how sure" from "what to do," designing a decision policy with explicit accept, reject, and abstain outcomes. Set per-class thresholds based on the asymmetric costs of being wrong to prevent silent, expensive errors.

Key insights

Model confidence scores are not probabilities; calibrate them and separate "how sure" from "what to do" based on error costs.

Principles

Method

Calibrate model confidence using temperature/Platt scaling or a confusability matrix (font-derived or error-log learned). Define a decision policy with accept, reject, and abstain outcomes, setting per-class thresholds based on asymmetric error costs.

In practice

Topics

Best for: AI Engineer, Machine Learning Engineer, MLOps Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by AI Advances - Medium.