Adaptive Conformal Prediction for Improving Factuality of Generations by Large Language Models

2026-04-16 · Source: cs.AI updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics, Natural Language Processing · Depth: Expert, extended

Summary

Researchers from Mohamed bin Zayed University of Artificial Intelligence (MBZUAI) propose an adaptive conformal prediction framework to improve the factual accuracy of Large Language Model (LLM) generations. Existing conformal prediction methods for LLMs often use a single, global uncertainty threshold, which can lead to inconsistent coverage (over- or under-coverage) for prompts of varying difficulty or characteristics. The new approach extends conformal score transformation methods to LLMs, enabling prompt-dependent calibration. This method maintains marginal coverage guarantees while significantly enhancing conditional coverage across diverse inputs. The framework supports selective prediction, allowing unreliable claims or answer choices to be filtered out in applications like long-form generation and multiple-choice question answering. Evaluated on white-box models such as Mistral-7B-Instruct-v0.2, Llama-3.1-8B-Instruct, and Gemma-3-12B-Instruct across various domains, the adaptive method consistently outperforms baselines in conditional coverage.

Key takeaway

For research scientists developing reliable LLM applications, you should consider implementing adaptive conformal prediction to mitigate factual errors. This approach offers more stable and accurate uncertainty quantification across heterogeneous prompts and tasks, ensuring that your models maintain high conditional coverage without sacrificing marginal guarantees. By adapting calibration to specific input characteristics, you can build more trustworthy LLM systems, especially in high-risk domains where factual accuracy is paramount.

Key insights

Adaptive conformal prediction improves LLM factuality by calibrating uncertainty thresholds based on prompt characteristics, enhancing conditional coverage.

Principles

Marginal coverage does not guarantee conditional coverage.
Input-dependent score normalization improves calibration.
Decompose long-form text into atomic, verifiable claims.

Method

The method trains a conditional quantile estimator on prompt embeddings and nonconformity scores, then calibrates transformed scores to derive a prompt-adaptive filtering threshold for LLM outputs.

In practice

Use Claim Conditioned Probability (CCP) for claim-level uncertainty.
Apply PCA to reduce prompt embedding dimensionality.
Split data into three sets: cal1, cal2, and test.

Topics

Adaptive Conformal Prediction
Large Language Models
Factuality Assessment
Hallucination Detection
Conditional Coverage

Best for: Research Scientist, AI Scientist, Machine Learning Engineer, NLP Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.