Beyond Fixed False Discovery Rates: Post-Hoc Conformal Selection with E-Variables
Summary
Meiyi Zhu and Osvaldo Simeone introduce Post-Hoc Conformal Selection (PH-CS), a novel framework that addresses the limitation of traditional Conformal Selection (CS) which requires fixing the False Discovery Rate (FDR) level before data observation. PH-CS generates a path of candidate selection sets, each paired with a data-driven False Discovery Proportion (FDP) estimate, allowing users to select an operating point that maximizes a user-specified utility function balancing selection size and FDP. Building on conformal e-variables and the e-Benjamini-Hochberg (e-BH) procedure, PH-CS provides a finite-sample post-hoc reliability guarantee where the ratio between estimated FDP and true FDP is, on average, upper bounded by 1. The framework extends to control quality defined by a general risk (PH-RCS) and incorporates priority weighting. Experiments on synthetic and real-world datasets, including Recruitment, Musk, and Shuttle, demonstrate PH-CS's ability to consistently satisfy user-imposed utility and size constraints, unlike CS, while maintaining competitive FDR control and reliable FDP estimates.
Key takeaway
For Data Scientists or Research Scientists needing to make data-driven selection decisions, PH-CS offers a significant advantage over traditional Conformal Selection. You can now dynamically adjust the balance between the number of selected items and the False Discovery Rate (FDR) based on observed data and specific project constraints, rather than committing to a fixed FDR level upfront. This flexibility allows you to optimize for utility, such as minimum selection size or a trade-off between size and reliability, leading to more practical and resource-aware outcomes.
Key insights
PH-CS offers data-adaptive selection with post-hoc FDR control by leveraging e-variables and utility maximization.
Principles
- FDR control can be made data-adaptive post-hoc.
- E-variables enable level-uniform statistical validity.
- Utility functions balance selection size and reliability.
Method
PH-CS computes conformal e-variables, generates a path of candidate selection sets, estimates FDP for each, and selects the set maximizing a user-defined utility function.
In practice
- Use PH-CS for flexible candidate selection in drug discovery.
- Apply PH-CS in genomics to adaptively pursue candidates.
- Employ PH-CS for feature selection to adjust set size post-hoc.
Topics
- Post-Hoc Conformal Selection
- Conformal E-variables
- E-Benjamini-Hochberg Procedure
- False Discovery Rate Control
- Utility-Driven Selection
Best for: AI Scientist, Research Scientist, Data Scientist
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by stat.ML updates on arXiv.org.