Finite Resources False Discovery Rate Control in Structured Hypothesis Spaces

2026-06-16 · Source: stat.ML updates on arXiv.org · Field: Science & Research — Mathematics & Computational Sciences, Research Methodology & Innovation, Artificial Intelligence & Machine Learning · Depth: Expert, extended

Summary

A new framework is presented for False Discovery Rate (FDR) control in large-scale hypothesis testing, specifically addressing challenges arising from finite null draws, which lead to p-value uncertainty, and the inherent structure within hypothesis spaces. The framework introduces two distinct decision rules: Rule 1, a "Model-Free" approach, guarantees exact FDR control but with lower statistical power. Rule 2, based on "Mirror Statistics," maximizes power by adapting mirror-statistic control into count space, offering FDR control with a quantifiable slack. Utilizing a Reproducing Kernel Hilbert Space (RKHS) framework, the system also provides a policy for the efficient allocation of null distribution samples. Empirical evaluations on 10 ADbench datasets and the AlpacaEval 2.0 LLM-as-judge benchmark demonstrate that both rules effectively maintain FDR control, with Rule 2 consistently achieving higher power. The adaptive allocation policy further enhances decision-making and power per unit of budget.

Key takeaway

For Data Scientists managing large-scale hypothesis testing with limited null samples, this framework offers robust FDR control and improved power. You should consider implementing Rule 2 for higher statistical power, accepting a quantifiable slack, and utilize the adaptive allocation policy to efficiently distribute null sampling budgets, especially in structured hypothesis spaces. This approach can significantly increase actionable discoveries.

Key insights

A framework unifies FDR control for finite-data p-value uncertainty and structured hypothesis spaces, improving power and resource allocation.

Principles

Finite null draws introduce p-value uncertainty in hypothesis testing.
Structured hypothesis spaces can be leveraged to boost statistical power.
Mirror symmetry in count space enables robust FDR control with quantifiable slack.

Method

The framework uses count-based likelihoods and RKHS for structured priors. It offers two decision rules: Rule 1 for exact FDR, Rule 2 for higher power with controlled slack. An adaptive policy allocates null samples.

In practice

Use count-based likelihoods to handle uncertain p-values from finite null samples.
Apply RKHS methods to exploit inherent structure in hypothesis spaces.
Prioritize null sample allocation based on uncertainty and potential impact on decisions.

Topics

False Discovery Rate
Structured Hypothesis Spaces
Reproducing Kernel Hilbert Space
Adaptive Resource Allocation
Statistical Power
Large-Scale Hypothesis Testing

Best for: AI Scientist, Research Scientist, Data Scientist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by stat.ML updates on arXiv.org.