Teaching AI models to say “I’m not sure”

2026-04-22 · Source: MIT News - Artificial intelligence · Field: Technology & Digital — Artificial Intelligence & Machine Learning · Depth: Expert, short

Summary

MIT CSAIL researchers have developed a new training method, Reinforcement Learning with Calibration Rewards (RLCR), that significantly improves the reliability of AI confidence estimates without sacrificing performance. Published on April 22, 2026, this technique addresses a core flaw in how current reasoning models are trained, where they are rewarded only for correct answers and penalized for wrong ones, leading to overconfidence. RLCR introduces a Brier score into the reward function, penalizing models for miscalibrated confidence. In experiments, RLCR reduced calibration error by up to 90% while maintaining or improving accuracy across multiple benchmarks, including unseen datasets. This method trains language models to produce calibrated confidence scores alongside their answers, making them more reliable for critical applications in fields like medicine, law, and finance.

Key takeaway

For AI Product Managers deploying models in critical domains like finance or medicine, you should prioritize models trained with methods like RLCR. This ensures your AI systems provide reliable confidence estimates, reducing the risk of users making poor decisions based on overconfident, yet incorrect, AI outputs. Implementing such calibrated models can significantly enhance trust and safety in AI-driven applications.

Key insights

RLCR training improves AI confidence calibration by penalizing miscalibrated certainty, reducing overconfidence without accuracy loss.

Principles

Standard RL training actively degrades calibration.
Reasoning about uncertainty itself holds value.

Method

RLCR adds a Brier score to the reinforcement learning reward function, penalizing models for the gap between stated confidence and actual accuracy, ensuring models learn to reason about both the problem and their uncertainty.

In practice

Use RLCR for models in high-stakes decision environments.
Select candidate answers with highest self-reported confidence.
Include explicit uncertainty reasoning in classifier inputs.

Topics

Reinforcement Learning with Calibration Rewards
AI Model Calibration
Language Model Uncertainty
AI Overconfidence
Brier Score

Best for: AI Engineer, Research Scientist, AI Product Manager, AI Scientist, Machine Learning Engineer, NLP Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by MIT News - Artificial intelligence.