Why are there squares everywhere in statistics (e.g., normal density, variance, least squares, etc.)?
Summary
The article addresses the common statistical question of "why squares?" in calculations like variance, tracing the answer back to Gauss and Pythagoras. It explains that the mean minimizes square error, represented by the sum of squared differences from the mean, ARGMIN_mu SUM{n in 1:N} (x[n] – mu)^2 = mean(x). This principle is fundamental to the Gaussian (normal) distribution, where the mean is the maximum likelihood estimator, and connects directly to ordinary least squares regression. The author contrasts this with using absolute values, where the median minimizes absolute error, ARGMIN_mu SUM{x in 1:N} abs(x[n] – mu) = median(x). The discussion extends to Bayesian inference, noting that posterior means minimize expected square error, while posterior medians minimize expected absolute error, highlighting the role of the error function in point estimation. Finally, the article links squared error to squared Euclidean distance and introduces the multivariate normal distribution, where a quadratic form involving the inverse covariance defines Mahalanobis distance.
Key takeaway
For Data Scientists and Machine Learning Engineers designing models or interpreting statistical results, understanding the "why" behind squaring errors is crucial. Your choice between mean and median, or between squared and absolute error functions, directly impacts model assumptions and robustness. Recognize that squared error aligns with Gaussian distributions and maximum likelihood estimation, while absolute error offers robustness against outliers by minimizing median-based deviations. This insight informs appropriate metric selection for diverse data characteristics.
Key insights
The mean minimizes square error, while the median minimizes absolute error, foundational to statistical distributions.
Principles
- Mean minimizes square error.
- Median minimizes absolute error.
- Error function dictates Bayesian point estimates.
Method
To find the value that minimizes square error for a dataset, calculate the mean. To find the value that minimizes absolute error, calculate the median.
In practice
- Use mean for Gaussian-distributed data.
- Use median for robust estimation against outliers.
- Specify error function for Bayesian point estimates.
Topics
- Least Squares
- Normal Distribution
- Mean and Median
- Maximum Likelihood Estimation
- Mahalanobis Distance
Best for: AI Student, Data Scientist, Machine Learning Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Statistical Modeling, Causal Inference, and Social Science.