Everywhere Valid Bounds on False Discovery Proportions in Conformal Inference
Summary
This paper introduces a unified framework for establishing finite-sample, distribution-free upper bounds on False Discovery Proportions (FDP) in conformal inference. Unlike existing methods that control the expected FDP and require pre-fixed thresholds, this approach provides high-probability bounds that hold simultaneously across all possible rejection thresholds. This enables flexible, post hoc selection of thresholds without invalidating statistical guarantees. The method constructs a high-probability envelope for the empirical distribution function of "null" conformal p-values by sampling from their joint distribution. It is applied to outlier detection and conformal selection, demonstrating tighter and more valid bounds than previous approaches in synthetic and real-data experiments, including a drug-target interaction task using the DAVIS dataset.
Key takeaway
For Data Scientists or Research Scientists performing multiple testing with conformal p-values, this framework allows you to adaptively select rejection thresholds after inspecting data, without sacrificing statistical validity. You gain rigorous, high-probability FDP bounds that hold across all thresholds, providing reliable instance-wise error control. This flexibility is crucial for exploratory analysis in areas like drug discovery or outlier detection, where initial results often guide subsequent adjustments.
Key insights
Simultaneous FDP bounds enable flexible, data-driven threshold selection with rigorous statistical guarantees.
Principles
- FDR control only guarantees FDP is small on average, not for the data at hand.
- Post hoc threshold adjustments invalidate traditional FDR control.
- Exchangeability allows tractable joint distribution sampling for null conformal p-values.
Method
Construct a high-probability envelope for the empirical CDF of "null" conformal p-values by sampling from their joint distribution, modulating its shape with summary statistics like Truncated Higher Criticism.
In practice
- Modulate envelope shape for tighter bounds in regions of primary interest.
- Apply to outlier detection and conformal selection problems.
- Code is available for reproducing numerical experiments.
Topics
- Conformal Inference
- False Discovery Proportion
- Multiple Testing
- Outlier Detection
- Conformal Selection
- High-Probability Bounds
- Empirical CDF
Code references
Best for: AI Scientist, Research Scientist, Data Scientist
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by stat.ML updates on arXiv.org.