# Why Most Data Science Doesn’t Answer the Question You’re Asking

2026-05-20 · Source: Data Science on Medium · Field: Technology & Digital — Data Science & Analytics, Artificial Intelligence & Machine Learning · Depth: Novice, medium

Summary

This content introduces causal inference, a critical but often overlooked area in data science, by highlighting the distinction between correlation and causation. It explains that many data science analyses mistakenly attribute causality based on correlation, leading to flawed business decisions, such as misinterpreting feature engagement as a direct cause of increased spending. The core challenge, known as the "Fundamental Problem of Causal Inference," is that one can only observe one outcome per unit (e.g., a user either saw a feature or didn't, but not both). The series outlines four primary strategies to reconstruct the unobserved counterfactual: A/B Testing (Randomized Control Trials), Difference-in-Differences, Propensity Score Matching, and Synthetic Control, along with Regression Discontinuity. Each method addresses different data situations and relies on specific assumptions, which are crucial for data scientists to understand.

Key takeaway

For Data Scientists and AI Product Managers evaluating feature impact or policy effectiveness, understanding causal inference is crucial. Your models might accurately predict outcomes, but without causal analysis, you risk attributing effects to the wrong causes, leading to suboptimal interventions. Prioritize framing questions as "what causes Y?" rather than "what predicts Y?" and explore methods like A/B testing or Difference-in-Differences to ensure your insights drive actual change.

Key insights

Distinguishing correlation from causation is fundamental for making effective, intervention-based decisions in data science.

Principles

Correlation does not imply causation.
Confounding variables contaminate naive comparisons.
The counterfactual is always missing.

Method

Causal inference methods reconstruct counterfactuals using strategies like A/B testing, Difference-in-Differences, Propensity Score Matching, Synthetic Control, and Regression Discontinuity, each suited for different data scenarios.

In practice

Use A/B tests for randomized treatment assignment.
Apply Difference-in-Differences for before-after data.
Employ Propensity Score Matching for observational data.

Topics

Causal Inference
Confounding
A/B Testing
Difference-in-Differences
Propensity Score Matching

Code references

Ajay-Deshpande/applied-causal-inference

Best for: Data Scientist, Director of AI/ML, AI Product Manager

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Data Science on Medium.