Sequential Kernel-based Conditional Independence Testing via Adaptive Betting

2026-06-18 · Source: stat.ML updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics · Depth: Expert, extended

Summary

A new method, Sequential Kernel-based Conditional Independence (SKCI) testing, addresses the fragility of existing sequential conditional independence (CI) tests when the relevant conditional distribution (Model-X assumption) must be estimated rather than known exactly. SKCI applies testing-by-betting to an adaptively optimized Kernel Conditional Independence statistic, incorporating a normalization scheme and a truncate-and-shift calibration strategy. These modifications significantly reduce Type I error inflation while maintaining high power. Evaluated across high-dimensional synthetic benchmarks (e.g., Gaussian, CI hardness, RatInABox neural data) and real-world fairness tasks (car insurance discrimination, dSprites image data), SKCI consistently outperforms existing sequential Model-X approaches, particularly in "Pretrained" and "Online" modes where conditional distributions are estimated. Code is available at https://github.com/he-zh/SKCI.

Key takeaway

For Machine Learning Engineers developing or auditing models in dynamic environments, SKCI provides a robust solution for sequential conditional independence testing. If your application involves continuously arriving data where conditional distributions must be estimated, SKCI's ability to control Type I error while maintaining high power, even in challenging high-dimensional settings, makes it a superior choice over traditional Model-X approaches. Consider integrating SKCI to ensure reliable statistical inference and fairness in your online systems.

Key insights

SKCI offers robust sequential conditional independence testing even when conditional distributions are estimated, not exactly known.

Principles

Exact Type I error control is generally impossible without strong assumptions in CI testing.
Test statistics should be chosen to accumulate evidence quickly under weak signals.
Sequential data partitioning (training, validation, test) is crucial for predictable payoff functions.

Method

SKCI uses testing-by-betting with an adaptively optimized Kernel Conditional Independence statistic, self-normalization, and a truncate-and-shift calibration strategy approximated via Gaussian law for shift estimation.

In practice

Apply SKCI to audit for fairness in systems like car insurance pricing.
Use SKCI for robust conditional independence testing in high-dimensional online data streams.
Leverage the provided code at https://github.com/he-zh/SKCI for implementation.

Topics

Conditional Independence Testing
Sequential Hypothesis Testing
Kernel Methods
Model-X Paradigm
Fairness Auditing
Anytime-Valid Testing

Code references

Best for: Research Scientist, AI Scientist, Machine Learning Engineer, Data Scientist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by stat.ML updates on arXiv.org.