Prediction-Powered Causal Inference by Automatic Debiased Machine Learning and Semi-Supervised Riesz Regression

· Source: Machine Learning · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics · Depth: Expert, quick

Summary

This study introduces Prediction-Powered Causal Inference (PPCI), a novel framework for semiparametric efficient estimation of causal and structural parameters within a semi-supervised context. PPCI utilizes unlabeled auxiliary regressors in addition to labeled observations to achieve significantly smaller asymptotic variances compared to methods relying solely on labeled data. The research first establishes the efficient influence function and efficiency bound, demonstrating the variance reduction attainable through auxiliary regressors. It then proposes DML-PPCI methods, integrating the efficient influence function with the debiased machine learning (DML) framework. Two specific estimators, EE-DML-PPCI (estimating-equation) and TMLE-DML-PPCI (targeted-learning), are detailed, both matching the derived efficiency bound. A crucial aspect involves estimating the efficient influence function, which is a Neyman orthogonal score, and for this, the study develops semi-supervised generalized Riesz regression with convergence rate guarantees.

Key takeaway

For data scientists or econometricians working with causal inference, you should consider integrating Prediction-Powered Causal Inference (PPCI) into your estimation workflows. This approach allows you to incorporate unlabeled auxiliary regressors, significantly reducing the asymptotic variance of your causal and structural parameter estimates. Implementing DML-PPCI, such as EE-DML-PPCI or TMLE-DML-PPCI, yields more robust and efficient models, especially with limited labeled data.

Key insights

Prediction-Powered Causal Inference (PPCI) uses unlabeled data to significantly reduce asymptotic variance in causal and structural parameter estimation.

Principles

Method

DML-PPCI combines efficient influence functions with debiased machine learning. It uses semi-supervised generalized Riesz regression to estimate the Riesz representer for Neyman orthogonal scores.

In practice

Topics

Best for: AI Scientist, Research Scientist, Data Scientist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Machine Learning.