Beyond Fixed False Discovery Rates: Post-Hoc Conformal Selection with E-Variables

· Source: stat.ML updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics, Research Methodology & Innovation · Depth: Expert, extended

Summary

Meiyi Zhu and Osvaldo Simeone introduce Post-Hoc Conformal Selection (PH-CS), a novel framework that addresses the limitation of traditional Conformal Selection (CS) which requires fixing the False Discovery Rate (FDR) level before data observation. PH-CS generates a path of candidate selection sets, each paired with a data-driven False Discovery Proportion (FDP) estimate, allowing users to select an operating point that maximizes a user-specified utility function balancing selection size and FDP. Building on conformal e-variables and the e-Benjamini-Hochberg (e-BH) procedure, PH-CS provides a finite-sample post-hoc reliability guarantee where the ratio between estimated FDP and true FDP is, on average, upper bounded by 1. The framework extends to control quality defined by a general risk (PH-RCS) and incorporates priority weighting. Experiments on synthetic and real-world datasets, including Recruitment, Musk, and Shuttle, demonstrate PH-CS's ability to consistently satisfy user-imposed utility and size constraints, unlike CS, while maintaining competitive FDR control and reliable FDP estimates.

Key takeaway

For Data Scientists or Research Scientists needing to make data-driven selection decisions, PH-CS offers a significant advantage over traditional Conformal Selection. You can now dynamically adjust the balance between the number of selected items and the False Discovery Rate (FDR) based on observed data and specific project constraints, rather than committing to a fixed FDR level upfront. This flexibility allows you to optimize for utility, such as minimum selection size or a trade-off between size and reliability, leading to more practical and resource-aware outcomes.

Key insights

PH-CS offers data-adaptive selection with post-hoc FDR control by leveraging e-variables and utility maximization.

Principles

Method

PH-CS computes conformal e-variables, generates a path of candidate selection sets, estimates FDP for each, and selects the set maximizing a user-defined utility function.

In practice

Topics

Best for: AI Scientist, Research Scientist, Data Scientist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by stat.ML updates on arXiv.org.