Cellwise Outliers
Summary
The paper "Cellwise Outliers" by Hubert, Raymaekers, and Rousseeuw, published March 31, 2026, reviews the evolving understanding and treatment of outliers in statistics and machine learning, shifting from traditional casewise outliers to individual cellwise outliers within data matrices or tensors. Casewise methods assume entire data points are anomalous, but cellwise contamination, where individual values are erroneous, can affect over half the cases even with a small proportion of outlying cells (e.g., 5% cellwise outliers in 14 dimensions contaminates 51% of cases). The authors highlight that detecting and constructing cellwise robust methods requires distinct techniques, often abandoning intuitive equivariance properties. The review covers significant progress in cellwise robust estimation for location and covariance matrices, regression, principal component analysis, and methods for high-dimensional and tensor data, noting that the cellwise approach is becoming dominant for high-dimensional data and can handle missing values.
Key takeaway
For data scientists and statisticians working with high-dimensional datasets, understanding and implementing cellwise robust methods is crucial. Traditional casewise outlier detection can fail when individual data points are contaminated, leading to inaccurate models. You should explore tools like the R package "cellWise" which offers the DDC algorithm and other cellwise robust techniques to ensure your analyses are resilient to localized data errors, especially when dealing with complex data structures like tensors or functional data.
Key insights
Cellwise outliers, individual anomalous data points, require distinct robust statistical methods, especially in high-dimensional data.
Principles
- Cellwise contamination can impact most cases.
- Casewise methods fail with cellwise outliers.
- Robust methods need to adapt to model fit.
Method
The Detect Deviating Cells (DDC) algorithm identifies cellwise outliers by fitting robust simple linear regressions for each variable against others, generating cellwise residuals, and flagging deviations. This can be visualized in a cellmap.
In practice
- Use DDC for cellwise outlier detection.
- Apply robust transformations for asymmetric variables.
- Consider cellRCov for high-dimensional covariance.
Topics
- Cellwise Outliers
- Robust Statistics
- High-Dimensional Data
- Tensor Data Analysis
- Cellwise Outlier Detection
Best for: AI Scientist, Research Scientist, Data Scientist
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by stat.ML updates on arXiv.org.