‘Probably’ doesn’t mean the same thing to your AI as it does to you

· Source: ΑΙhub · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Emerging Technologies & Innovation · Depth: Intermediate, short

Summary

A study published in NPJ Complexity by Mayank Kejriwal and colleagues reveals that large language models (LLMs) like ChatGPT often misalign with human interpretations of uncertainty, particularly when using words of estimative probability such as "maybe," "probably," and "likely." While LLMs agree with humans on extreme probabilities like "impossible," they diverge significantly on hedge words; for instance, an LLM might interpret "likely" as 80% probability, while humans perceive it closer to 65%. This discrepancy stems from LLMs averaging conflicting usages in training data, unlike humans who use contextual cues. The research also found LLMs are sensitive to gendered language and prompt language, showing more rigid probability estimates with "she" prompts and shifts when prompts change from English to Chinese, reflecting biases and linguistic differences in their training data.

Key takeaway

For AI developers and product managers integrating LLMs into critical applications, understanding that an AI's "probably" may not align with human intuition is crucial. You should prioritize developing and deploying models with explicit, consistent uncertainty quantification to prevent miscommunication in fields like healthcare or government policy. Ensure your models are rigorously tested for biases related to gendered or cross-lingual prompts, as these can significantly alter probability estimates and erode user trust.

Key insights

AI chatbots misinterpret words of estimative probability compared to humans, posing risks in high-stakes applications.

Principles

Method

The study compared how AI models and humans map words of estimative probability (e.g., "maybe," "probably") to numerical percentages, analyzing divergences and sensitivities to gendered and cross-lingual prompts.

In practice

Topics

Best for: Research Scientist, CTO, VP of Engineering/Data, AI Scientist, AI Ethicist, Director of AI/ML

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by ΑΙhub.