Sequential Kernel-based Conditional Independence Testing via Adaptive Betting
Summary
A new method, Sequential Kernel-based Conditional Independence (SKCI) testing, addresses the fragility of existing sequential conditional independence (CI) tests when the relevant conditional distribution (Model-X assumption) must be estimated rather than known exactly. SKCI applies testing-by-betting to an adaptively optimized Kernel Conditional Independence statistic, incorporating a normalization scheme and a truncate-and-shift calibration strategy. These modifications significantly reduce Type I error inflation while maintaining high power. Evaluated across high-dimensional synthetic benchmarks (e.g., Gaussian, CI hardness, RatInABox neural data) and real-world fairness tasks (car insurance discrimination, dSprites image data), SKCI consistently outperforms existing sequential Model-X approaches, particularly in "Pretrained" and "Online" modes where conditional distributions are estimated. Code is available at https://github.com/he-zh/SKCI.
Key takeaway
For Machine Learning Engineers developing or auditing models in dynamic environments, SKCI provides a robust solution for sequential conditional independence testing. If your application involves continuously arriving data where conditional distributions must be estimated, SKCI's ability to control Type I error while maintaining high power, even in challenging high-dimensional settings, makes it a superior choice over traditional Model-X approaches. Consider integrating SKCI to ensure reliable statistical inference and fairness in your online systems.
Key insights
SKCI offers robust sequential conditional independence testing even when conditional distributions are estimated, not exactly known.
Principles
- Exact Type I error control is generally impossible without strong assumptions in CI testing.
- Test statistics should be chosen to accumulate evidence quickly under weak signals.
- Sequential data partitioning (training, validation, test) is crucial for predictable payoff functions.
Method
SKCI uses testing-by-betting with an adaptively optimized Kernel Conditional Independence statistic, self-normalization, and a truncate-and-shift calibration strategy approximated via Gaussian law for shift estimation.
In practice
- Apply SKCI to audit for fairness in systems like car insurance pricing.
- Use SKCI for robust conditional independence testing in high-dimensional online data streams.
- Leverage the provided code at https://github.com/he-zh/SKCI for implementation.
Topics
- Conditional Independence Testing
- Sequential Hypothesis Testing
- Kernel Methods
- Model-X Paradigm
- Fairness Auditing
- Anytime-Valid Testing
Code references
Best for: Research Scientist, AI Scientist, Machine Learning Engineer, Data Scientist
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by stat.ML updates on arXiv.org.