Prediction-Powered Causal Inference by Automatic Debiased Machine Learning and Semi-Supervised Riesz Regression

· Source: stat.ML updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics, Mathematics & Computational Sciences · Depth: Expert, extended

Summary

This study introduces Prediction-Powered Causal Inference (PPCI), a framework for semiparametric efficient estimation of causal and structural parameters in a semi-supervised setting. It leverages unlabeled auxiliary regressors alongside labeled observations to achieve smaller asymptotic variances than methods using only labeled data. The research derives the efficient influence function and efficiency bounds, demonstrating that unlabeled data can reduce the regressor-averaging component of the efficiency bound. The proposed methods, called DML-PPCI (Debiased Machine Learning-PPCI), include Estimating-Equation (EE-DML-PPCI) and Targeted Maximum Likelihood (TMLE-DML-PPCI) estimators. A key component is the development of semi-supervised generalized Riesz regression for estimating the Riesz representer, with convergence rate guarantees for various function classes, including deep ReLU sieves.

Key takeaway

For data scientists and machine learning engineers working on causal inference, you should explore integrating unlabeled auxiliary regressor datasets using the DML-PPCI framework. This approach can significantly improve the precision of your causal parameter estimates, such as ATE or APE, by reducing estimation variance. Consider implementing semi-supervised generalized Riesz regression to effectively leverage these unlabeled data, especially when dealing with large datasets or complex models like deep ReLU networks.

Key insights

Unlabeled auxiliary regressors can significantly reduce asymptotic variance in causal inference.

Principles

Method

DML-PPCI combines efficient influence functions with debiased machine learning, using either estimating equations (EE-DML-PPCI) or targeted maximum likelihood (TMLE-DML-PPCI) and semi-supervised generalized Riesz regression for nuisance parameter estimation.

In practice

Topics

Best for: Research Scientist, AI Scientist, Machine Learning Engineer, Data Scientist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by stat.ML updates on arXiv.org.