‘Probably’ doesn’t mean the same thing to your AI as it does to you

2026-04-17 · Source: ΑΙhub · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Emerging Technologies & Innovation · Depth: Intermediate, short

Summary

A study published in NPJ Complexity by Mayank Kejriwal and colleagues reveals that large language models (LLMs) like ChatGPT often misalign with human interpretations of uncertainty, particularly when using words of estimative probability such as "maybe," "probably," and "likely." While LLMs agree with humans on extreme probabilities like "impossible," they diverge significantly on hedge words; for instance, an LLM might interpret "likely" as 80% probability, while humans perceive it closer to 65%. This discrepancy stems from LLMs averaging conflicting usages in training data, unlike humans who use contextual cues. The research also found LLMs are sensitive to gendered language and prompt language, showing more rigid probability estimates with "she" prompts and shifts when prompts change from English to Chinese, reflecting biases and linguistic differences in their training data.

Key takeaway

For AI developers and product managers integrating LLMs into critical applications, understanding that an AI's "probably" may not align with human intuition is crucial. You should prioritize developing and deploying models with explicit, consistent uncertainty quantification to prevent miscommunication in fields like healthcare or government policy. Ensure your models are rigorously tested for biases related to gendered or cross-lingual prompts, as these can significantly alter probability estimates and erode user trust.

Key insights

AI chatbots misinterpret words of estimative probability compared to humans, posing risks in high-stakes applications.

Principles

AI uncertainty communication diverges from human intuition.
Training data biases impact AI probability estimates.
Contextual cues are vital for human probability interpretation.

Method

The study compared how AI models and humans map words of estimative probability (e.g., "maybe," "probably") to numerical percentages, analyzing divergences and sensitivities to gendered and cross-lingual prompts.

In practice

Implement robust consistency metrics for AI probability outputs.
Scrutinize AI communication in high-stakes domains like healthcare.
Address gender and linguistic biases in AI training data.

Topics

Large Language Models
Uncertainty Communication
Human-AI Alignment
AI Bias
AI Safety

Best for: Research Scientist, CTO, VP of Engineering/Data, AI Scientist, AI Ethicist, Director of AI/ML

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by ΑΙhub.