Trust, but Don't Verify: Epistemic Blind Spots in LLM Source Evaluation

· Source: cs.AI updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, AI Ethics & Safety · Depth: Expert, extended

Summary

The study "Trust, but Don't Verify: Epistemic Blind Spots in LLM Source Evaluation" reveals that large language models (LLMs) like Claude, Qwen, and OLMo exhibit a critical vulnerability when synthesizing information from multiple sources. Across five models and three professional domains (venture capital, marketing, public health), LLMs reliably detect fabricated statistics in isolation (correct identification rates of 0.76–1.00) but fail to apply this capability during multi-source synthesis. Source influence is primarily governed by a "methodology-register gate" that responds to the stylistic presentation of analytical text, not the numeric validity of claims. Mechanistic analyses, including causal tracing and linear probes, confirm that models encode methodological register as a domain-general representation (probe AUC 0.83–0.92), while numeric-validity signals are suppressed to chance during synthesis. Prompting mitigations, even oracle checklists, only induce blanket skepticism, not selective discernment. This "epistemic alignment" gap means LLMs trust sources based on apparent credibility, not internal consistency.

Key takeaway

For AI Scientists and Machine Learning Engineers deploying LLMs in critical decision-making contexts, you must recognize that current models are susceptible to "epistemic blind spots." Your LLMs will prioritize the appearance of methodological rigor over the substance of numeric validity, even when they can detect fabrications in isolation. This vulnerability is amplified when a source lacks social consensus. Implement robust human-in-the-loop verification for quantitative data synthesis, as prompting alone fails to induce selective discernment.

Key insights

LLMs detect statistical fabrications in isolation but ignore them during multi-source synthesis, prioritizing stylistic credibility.

Principles

Method

Researchers conducted factorial behavioral experiments across five LLM families and three professional domains, using linear probes, causal tracing, and component-level attribution for mechanistic analysis.

In practice

Topics

Best for: Research Scientist, CTO, VP of Engineering/Data, AI Scientist, Machine Learning Engineer, AI Ethicist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.