Sequential Kernel-based Conditional Independence Testing via Adaptive Betting

· Source: stat.ML updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics · Depth: Expert, extended

Summary

A new method, Sequential Kernel-based Conditional Independence (SKCI) testing, addresses the fragility of existing sequential conditional independence (CI) tests when the relevant conditional distribution (Model-X assumption) must be estimated rather than known exactly. SKCI applies testing-by-betting to an adaptively optimized Kernel Conditional Independence statistic, incorporating a normalization scheme and a truncate-and-shift calibration strategy. These modifications significantly reduce Type I error inflation while maintaining high power. Evaluated across high-dimensional synthetic benchmarks (e.g., Gaussian, CI hardness, RatInABox neural data) and real-world fairness tasks (car insurance discrimination, dSprites image data), SKCI consistently outperforms existing sequential Model-X approaches, particularly in "Pretrained" and "Online" modes where conditional distributions are estimated. Code is available at https://github.com/he-zh/SKCI.

Key takeaway

For Machine Learning Engineers developing or auditing models in dynamic environments, SKCI provides a robust solution for sequential conditional independence testing. If your application involves continuously arriving data where conditional distributions must be estimated, SKCI's ability to control Type I error while maintaining high power, even in challenging high-dimensional settings, makes it a superior choice over traditional Model-X approaches. Consider integrating SKCI to ensure reliable statistical inference and fairness in your online systems.

Key insights

SKCI offers robust sequential conditional independence testing even when conditional distributions are estimated, not exactly known.

Principles

Method

SKCI uses testing-by-betting with an adaptively optimized Kernel Conditional Independence statistic, self-normalization, and a truncate-and-shift calibration strategy approximated via Gaussian law for shift estimation.

In practice

Topics

Code references

Best for: Research Scientist, AI Scientist, Machine Learning Engineer, Data Scientist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by stat.ML updates on arXiv.org.