Causal Inference Is Eating Machine Learning

· Source: Towards Data Science · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics · Depth: Intermediate, long

Summary

A health-tech company's readmission-prediction model, despite 94% accuracy on test data, led to increased readmission rates because it identified correlations, not causal factors. The model flagged older patients and specific zip codes, but interventions based on these predictions failed as the true causes were socioeconomic factors like medication affordability and transportation. This highlights a critical distinction between associational reasoning (prediction) and causal inference (understanding "what should we do?"). Judea Pearl's Ladder of Causation explains this gap, with most machine learning operating at Level 1 (Association) while business decisions require Level 2 (Intervention) or Level 3 (Counterfactual). Historical examples like the Hormone Replacement Therapy case and Simpson's paradox demonstrate how conflating prediction with causation can lead to harmful outcomes, with a 2021 review finding 26% of observational medical studies making this error. Modern tools like Microsoft's DoWhy and EconML, part of the PyWhy project, are making causal inference accessible, with the global Causal AI market projected to reach $116 billion by 2026.

Key takeaway

For AI Product Managers or Data Scientists building models for decision-making, you must differentiate between prediction and causation. If your model's recommendations would change the relationships it learned from, you've left prediction territory. Utilize causal inference tools like DoWhy to move beyond correlations and identify true causal drivers, ensuring your interventions lead to desired outcomes and avoid costly failures, as demonstrated by the health-tech company's 18% readmission rate drop.

Key insights

Distinguishing between prediction and causation is crucial for effective decision-making in AI applications.

Principles

Method

DoWhy's four-step causal analysis: model assumptions, identify estimand, estimate effect, and refute the result to ensure robustness.

In practice

Topics

Code references

Best for: Machine Learning Engineer, Data Scientist, AI Product Manager

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Towards Data Science.