Deep-testing: the case of dependence detection
Summary
Researchers propose "deep-testing," a novel procedure that applies deep learning to classical hypothesis testing, specifically for independence detection between two continuous univariate random variables. The method trains a deep neural network as a classifier using simulated data under both null (independence) and alternative (dependence) hypotheses. The test statistic is a classification map learned by the network, which is then calibrated using Monte Carlo simulations to achieve a near-exact test. Three network architectures were evaluated: All-CNN (image-based scatter plots), All-MLP (19 dependence indicators plus sample size), and All-CNN-MLP (a hybrid of both). In a large-scale simulation study, the hybrid All-CNN-MLP achieved the highest overall power against nineteen competing methods across a broad range of 20 complex dependence structures, and maintained strong performance on 6 unseen dependence models, validating the approach's viability.
Key takeaway
For AI Scientists and Machine Learning Engineers developing statistical inference tools, deep-testing offers a powerful, robust alternative to traditional independence tests. Your teams should consider implementing the hybrid All-CNN-MLP architecture, which demonstrated superior average power and adaptability to unseen dependence patterns. This approach simplifies test calibration and enhances the reliability of detecting complex dependencies in data.
Key insights
Deep-testing leverages deep learning for hypothesis testing, using neural networks to classify samples as independent or dependent.
Principles
- Hypothesis testing can be reframed as a binary classification problem.
- Deep learning models can learn highly discriminative test statistics.
- Margin-free representations enable universal network training across sample sizes.
Method
Train a deep neural network on simulated data (null/alternative hypotheses), then use the learned classification map as a test statistic. Calibrate this statistic under the null hypothesis via Monte Carlo simulations to derive near-exact critical values.
In practice
- Use hybrid All-CNN-MLP for robust independence testing.
- Employ rank-based pseudo-observations for margin-free input features.
- Simulate diverse dependence structures for comprehensive training.
Topics
- Deep-testing
- Hypothesis Testing
- Independence Testing
- Deep Learning
- Neural Networks
Best for: AI Scientist, Machine Learning Engineer, Research Scientist
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by stat.ML updates on arXiv.org.