Robust Local Polynomial Regression with Similarity Kernels
Summary
Robust Local Polynomial Regression with Similarity Kernels (RSKLPR) is a new nonparametric method designed to enhance the robustness of traditional Local Polynomial Regression (LPR) against outliers and high-leverage points. This framework introduces a novel weighting mechanism that utilizes two positive definite kernels, incorporating both predictor and response variables to estimate weights through localized density estimation. Implemented in Python and available at https://github.com/yaniv-shulman/rsklpr, RSKLPR demonstrates competitive performance in synthetic benchmark experiments. It consistently improves robustness and accuracy compared to standard LPR, particularly in heteroscedastic and noisy environments, and achieves these results in a single iteration, unlike some iterative robust variants. The method also exhibits reduced sensitivity to neighborhood size.
Key takeaway
For Machine Learning Engineers or Data Scientists building regression models on noisy or outlier-prone datasets, you should consider adopting Robust Local Polynomial Regression with Similarity Kernels (RSKLPR). This method offers superior robustness and accuracy over standard LPR, especially in heteroscedastic environments, without requiring multiple iterations. You can implement this approach using the publicly available Python package, tuning the K_2 kernel's bandwidth to balance robustness and potential bias.
Key insights
RSKLPR enhances LPR robustness by using similarity kernels that weight data based on both predictor and response variables.
Principles
- Incorporate response variables into kernel weighting.
- Localized density estimation mitigates outlier impact.
- Robustness can introduce asymptotic bias under non-normality.
Method
RSKLPR generalizes LPR's kernel function by defining a compound positive definite kernel, K_D, as a product of K_1 (predictor distance) and K_2 (predictor/response similarity via conditional or joint density estimation). This assigns robust weights to minimize empirical loss.
In practice
- Use RSKLPR for robust regression in noisy data.
- Implement with Laplacian kernel for K_1.
- Consider bandwidth H_2 for robustness-bias trade-off.
Topics
- Robust Local Polynomial Regression
- Similarity Kernels
- Nonparametric Regression
- Outlier Robustness
- Python Implementation
- Kernel Density Estimation
Code references
Best for: AI Scientist, Machine Learning Engineer, Research Scientist
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by stat.ML updates on arXiv.org.