LLM Doesn't Know What It Doesn't Know: Detecting Epistemic Blind Spots via Cross-Model Attribution Divergence on Clinical Tabular Data

· Source: Artificial Intelligence · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Health & Medical Research · Depth: Expert, quick

Summary

A study investigates the ability of large language models (LLMs) to recognize their knowledge limits when applied to structured clinical data. Comparing Qwen 2.5 7B with XGBoost using cross-model attribution divergence, researchers found LLM verbalized confidence to be epistemically vacuous, consistently outputting 0.856-0.937 regardless of actual accuracy (49% to 75.3%). The LLM also exhibited an inverse difficulty effect, achieving 64.8% accuracy when XGBoost was 99% correct, but matching XGBoost at 73.8% when XGBoost was moderately uncertain. Crucially, few-shot examples combined with SHAP-derived feature evidence acted as super-additive interventions, reducing the Attribution Disagreement Score (ADS) from 1.54 to 0.38 and boosting accuracy from 49% to 75.3% without training. Furthermore, a novel cross-model calibrator, leveraging attribution divergence signals, significantly reduced expected calibration error from 0.254 to 0.080, offering patient-specific reliability estimates without internal model access or repeated inference. This work frames the issue as a cold start problem for LLMs on structured data.

Key takeaway

For AI Scientists deploying LLMs on structured clinical data, you must recognize that LLM verbalized confidence is unreliable. Instead of relying on internal confidence scores, you should integrate few-shot examples and SHAP-derived feature evidence to significantly boost accuracy. Implement cross-model calibrators using attribution divergence to generate patient-specific reliability estimates, thereby addressing the cold start problem and improving model trustworthiness in critical applications.

Key insights

LLMs struggle with epistemic self-awareness on structured clinical data, but cross-model attribution divergence can detect blind spots and improve reliability.

Principles

Method

The study compares Qwen 2.5 7B and XGBoost on clinical prediction tasks using cross-model attribution divergence analysis. A calibrator then uses these divergence signals to provide patient-specific reliability estimates.

In practice

Topics

Best for: AI Scientist, Research Scientist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.