LOTTERY: Learning from Reference-Only Samples in Two-Sample Testing under Size Asymmetry
Summary
LOTTERY is a novel method for data-adaptive two-sample testing, specifically addressing few-shot settings characterized by severe sample-size imbalance, where abundant reference samples are available but only a few query samples exist. Unlike traditional data-splitting paradigms that struggle in such scenarios, LOTTERY constructively utilizes the large reference dataset. It learns reference-dependent representations that effectively summarize the salient structure of the reference distribution, providing informative signals for detecting distributional departures. The method incorporates diverse representation families capturing both global and local data structures, adaptively weighting them solely using reference samples through an uncertainty-guided principle. Theoretically, LOTTERY guarantees permutation-based type I error control and demonstrates consistency, with test power converging to one as sample sizes grow, provided the representation set includes at least one consistent representation. Empirically, it achieves strong performance across various benchmarks while preserving type I error control.
Key takeaway
For research scientists performing two-sample testing with severe sample-size imbalance, especially in few-shot scenarios, you should consider LOTTERY. This method offers a robust alternative to traditional data splitting. It leverages abundant reference data to maintain type I error control and achieve strong power. It changes your approach by providing a theoretically sound and empirically effective way to detect distributional shifts when query samples are scarce.
Key insights
LOTTERY enables robust two-sample testing in few-shot, imbalanced settings by learning reference-dependent representations from abundant reference data.
Principles
- Leverage abundant reference data constructively.
- Adaptively weight representations via uncertainty.
- Ensure permutation-based type I error control.
Method
LOTTERY learns reference-dependent representations from abundant reference samples, capturing global and local structure. It adaptively weights these representation families using an uncertainty-guided principle to detect distributional departures.
In practice
- Apply to few-shot data imbalance scenarios.
- Use for detecting subtle distribution shifts.
- Benchmark against traditional data-splitting methods.
Topics
- Two-Sample Testing
- Sample-Size Imbalance
- Few-Shot Learning
- Reference-Dependent Representations
- Type I Error Control
- Machine Learning
Best for: AI Scientist, Research Scientist
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Machine Learning.