LLM Doesn't Know What It Doesn't Know: Detecting Epistemic Blind Spots via Cross-Model Attribution Divergence on Clinical Tabular Data
Summary
A study investigates the ability of large language models (LLMs) to recognize their knowledge limits when applied to structured clinical data. Comparing Qwen 2.5 7B with XGBoost using cross-model attribution divergence, researchers found LLM verbalized confidence to be epistemically vacuous, consistently outputting 0.856-0.937 regardless of actual accuracy (49% to 75.3%). The LLM also exhibited an inverse difficulty effect, achieving 64.8% accuracy when XGBoost was 99% correct, but matching XGBoost at 73.8% when XGBoost was moderately uncertain. Crucially, few-shot examples combined with SHAP-derived feature evidence acted as super-additive interventions, reducing the Attribution Disagreement Score (ADS) from 1.54 to 0.38 and boosting accuracy from 49% to 75.3% without training. Furthermore, a novel cross-model calibrator, leveraging attribution divergence signals, significantly reduced expected calibration error from 0.254 to 0.080, offering patient-specific reliability estimates without internal model access or repeated inference. This work frames the issue as a cold start problem for LLMs on structured data.
Key takeaway
For AI Scientists deploying LLMs on structured clinical data, you must recognize that LLM verbalized confidence is unreliable. Instead of relying on internal confidence scores, you should integrate few-shot examples and SHAP-derived feature evidence to significantly boost accuracy. Implement cross-model calibrators using attribution divergence to generate patient-specific reliability estimates, thereby addressing the cold start problem and improving model trustworthiness in critical applications.
Key insights
LLMs struggle with epistemic self-awareness on structured clinical data, but cross-model attribution divergence can detect blind spots and improve reliability.
Principles
- LLM verbalized confidence is epistemically vacuous.
- LLMs show an inverse difficulty effect.
- Few-shot examples and SHAP evidence are super-additive.
Method
The study compares Qwen 2.5 7B and XGBoost on clinical prediction tasks using cross-model attribution divergence analysis. A calibrator then uses these divergence signals to provide patient-specific reliability estimates.
In practice
- Use few-shot examples for LLM accuracy.
- Integrate SHAP-derived feature evidence.
- Employ cross-model calibrators for reliability.
Topics
- Large Language Models
- Epistemic Uncertainty
- Clinical Data
- Attribution Divergence
- Model Calibration
- SHAP
Best for: AI Scientist, Research Scientist
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.