A nonparametric two-sample test using a parametric integral probability metric
Summary
A new nonparametric two-sample test, PReLU-TST, is introduced for detecting distributional differences between two independent samples without assuming a specific parametric distribution. This 45-page study proposes a test statistic, PReLU-IPM, based on an integral probability metric (IPM) that utilizes a specially designed parametric discriminator class with a single neural network node. The research establishes theoretical guarantees for PReLU-TST, including its consistency and asymptotical equivalence to existing nonparametric IPM-based tests under regularity conditions. Empirical evaluations across multiple simulated and real benchmark datasets demonstrate that PReLU-TST achieves higher power across various alternatives or performs comparably to competitor methods for finite samples. The work has been accepted for publication in Statistical Analysis and Data Mining.
Key takeaway
For Machine Learning Engineers or Data Scientists evaluating distributional differences between datasets, PReLU-TST offers a powerful, assumption-free alternative. You should consider integrating this nonparametric test, especially for finite samples where it shows higher power or comparable performance. This could improve the reliability of your model validation or data quality checks, providing stronger statistical evidence for sample divergence.
Key insights
PReLU-TST offers a robust nonparametric two-sample test with strong theoretical guarantees and superior finite-sample power.
Principles
- Nonparametric tests avoid distribution assumptions.
- IPMs can form the basis for test statistics.
- Discriminator class design impacts test power.
Method
The PReLU-TST procedure constructs a test statistic, PReLU-IPM, using an integral probability metric with a single-node neural network as its parametric discriminator class.
In practice
- Apply PReLU-TST for robust sample comparison.
- Evaluate PReLU-TST on new benchmark datasets.
- Consider single-node neural networks for discriminators.
Topics
- Nonparametric Testing
- Two-Sample Test
- Integral Probability Metrics
- Neural Networks
- Distributional Differences
- Machine Learning Statistics
Best for: Research Scientist, AI Scientist, Machine Learning Engineer, Data Scientist
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by stat.ML updates on arXiv.org.