Everywhere Valid Bounds on False Discovery Proportions in Conformal Inference

· Source: stat.ML updates on arXiv.org · Field: Science & Research — Mathematics & Computational Sciences, Artificial Intelligence & Machine Learning, Data Science & Analytics · Depth: Expert, extended

Summary

This paper introduces a unified framework for establishing finite-sample, distribution-free upper bounds on False Discovery Proportions (FDP) in conformal inference. Unlike existing methods that control the expected FDP and require pre-fixed thresholds, this approach provides high-probability bounds that hold simultaneously across all possible rejection thresholds. This enables flexible, post hoc selection of thresholds without invalidating statistical guarantees. The method constructs a high-probability envelope for the empirical distribution function of "null" conformal p-values by sampling from their joint distribution. It is applied to outlier detection and conformal selection, demonstrating tighter and more valid bounds than previous approaches in synthetic and real-data experiments, including a drug-target interaction task using the DAVIS dataset.

Key takeaway

For Data Scientists or Research Scientists performing multiple testing with conformal p-values, this framework allows you to adaptively select rejection thresholds after inspecting data, without sacrificing statistical validity. You gain rigorous, high-probability FDP bounds that hold across all thresholds, providing reliable instance-wise error control. This flexibility is crucial for exploratory analysis in areas like drug discovery or outlier detection, where initial results often guide subsequent adjustments.

Key insights

Simultaneous FDP bounds enable flexible, data-driven threshold selection with rigorous statistical guarantees.

Principles

Method

Construct a high-probability envelope for the empirical CDF of "null" conformal p-values by sampling from their joint distribution, modulating its shape with summary statistics like Truncated Higher Criticism.

In practice

Topics

Code references

Best for: AI Scientist, Research Scientist, Data Scientist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by stat.ML updates on arXiv.org.