Correlation Doesn’t Mean Causation! But What Does It Mean?
Summary
The article clarifies the precise mathematical definition of correlation, moving beyond the common adage "correlation doesn't imply causation." It explains correlation as a measurement of how two variables move together relative to their averages, rather than a vague indication of relatedness. Using the Pearson correlation coefficient, defined as $r = \frac{cov(X, Y)}{ \sigma_{X}.\sigma_{Y}}$, the text breaks down its calculation into covariance and normalization steps, resulting in a value between -1 and 1. Examples like pizza consumption and math scores, or ice cream sales and drowning incidents, illustrate how correlation can exist without causation, often due to hidden variables. A crucial limitation highlighted is that correlation only measures linear relationships, potentially missing strong nonlinear patterns like $y = x^2$.
Key takeaway
For data scientists and analysts interpreting relationships between variables, understand that correlation is a precise mathematical measurement of linear co-movement, not a vague concept. Do not assume causation from correlation alone; always consider potential hidden variables or nonlinear relationships. Use correlation as a valuable first signal to identify patterns that warrant deeper investigation, rather than dismissing it as meaningless.
Key insights
Correlation precisely measures how two variables move together, not causation or all types of relationships.
Principles
- Correlation measures consistent co-movement.
- Correlation is relative to variable averages.
- Correlation only detects linear relationships.
Method
Calculate Pearson correlation coefficient by dividing covariance of X and Y by the product of their standard deviations to normalize the scale to -1 to 1.
In practice
- Use correlation as an initial signal for patterns.
- Investigate hidden variables for causal links.
- Plot data to identify nonlinear relationships.
Best for: Data Scientist, Data Analyst, AI Student
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Towards Data Science.