Why are there squares everywhere in statistics (e.g., normal density, variance, least squares, etc.)?

· Source: Statistical Modeling, Causal Inference, and Social Science · Field: Science & Research — Mathematics & Computational Sciences · Depth: Intermediate, short

Summary

The article addresses the common statistical question of "why squares?" in calculations like variance, tracing the answer back to Gauss and Pythagoras. It explains that the mean minimizes square error, represented by the sum of squared differences from the mean, ARGMIN_mu SUM{n in 1:N} (x[n] – mu)^2 = mean(x). This principle is fundamental to the Gaussian (normal) distribution, where the mean is the maximum likelihood estimator, and connects directly to ordinary least squares regression. The author contrasts this with using absolute values, where the median minimizes absolute error, ARGMIN_mu SUM{x in 1:N} abs(x[n] – mu) = median(x). The discussion extends to Bayesian inference, noting that posterior means minimize expected square error, while posterior medians minimize expected absolute error, highlighting the role of the error function in point estimation. Finally, the article links squared error to squared Euclidean distance and introduces the multivariate normal distribution, where a quadratic form involving the inverse covariance defines Mahalanobis distance.

Key takeaway

For Data Scientists and Machine Learning Engineers designing models or interpreting statistical results, understanding the "why" behind squaring errors is crucial. Your choice between mean and median, or between squared and absolute error functions, directly impacts model assumptions and robustness. Recognize that squared error aligns with Gaussian distributions and maximum likelihood estimation, while absolute error offers robustness against outliers by minimizing median-based deviations. This insight informs appropriate metric selection for diverse data characteristics.

Key insights

The mean minimizes square error, while the median minimizes absolute error, foundational to statistical distributions.

Principles

Method

To find the value that minimizes square error for a dataset, calculate the mean. To find the value that minimizes absolute error, calculate the median.

In practice

Topics

Best for: AI Student, Data Scientist, Machine Learning Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Statistical Modeling, Causal Inference, and Social Science.