Adaptive Conformal Prediction for Improving Factuality of Generations by Large Language Models
Summary
Researchers from Mohamed bin Zayed University of Artificial Intelligence (MBZUAI) propose an adaptive conformal prediction framework to improve the factual accuracy of Large Language Model (LLM) generations. Existing conformal prediction methods for LLMs often use a single, global uncertainty threshold, which can lead to inconsistent coverage (over- or under-coverage) for prompts of varying difficulty or characteristics. The new approach extends conformal score transformation methods to LLMs, enabling prompt-dependent calibration. This method maintains marginal coverage guarantees while significantly enhancing conditional coverage across diverse inputs. The framework supports selective prediction, allowing unreliable claims or answer choices to be filtered out in applications like long-form generation and multiple-choice question answering. Evaluated on white-box models such as Mistral-7B-Instruct-v0.2, Llama-3.1-8B-Instruct, and Gemma-3-12B-Instruct across various domains, the adaptive method consistently outperforms baselines in conditional coverage.
Key takeaway
For research scientists developing reliable LLM applications, you should consider implementing adaptive conformal prediction to mitigate factual errors. This approach offers more stable and accurate uncertainty quantification across heterogeneous prompts and tasks, ensuring that your models maintain high conditional coverage without sacrificing marginal guarantees. By adapting calibration to specific input characteristics, you can build more trustworthy LLM systems, especially in high-risk domains where factual accuracy is paramount.
Key insights
Adaptive conformal prediction improves LLM factuality by calibrating uncertainty thresholds based on prompt characteristics, enhancing conditional coverage.
Principles
- Marginal coverage does not guarantee conditional coverage.
- Input-dependent score normalization improves calibration.
- Decompose long-form text into atomic, verifiable claims.
Method
The method trains a conditional quantile estimator on prompt embeddings and nonconformity scores, then calibrates transformed scores to derive a prompt-adaptive filtering threshold for LLM outputs.
In practice
- Use Claim Conditioned Probability (CCP) for claim-level uncertainty.
- Apply PCA to reduce prompt embedding dimensionality.
- Split data into three sets: cal1, cal2, and test.
Topics
- Adaptive Conformal Prediction
- Large Language Models
- Factuality Assessment
- Hallucination Detection
- Conditional Coverage
Best for: Research Scientist, AI Scientist, Machine Learning Engineer, NLP Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.