Robust Local Polynomial Regression with Similarity Kernels

2026-06-17 · Source: stat.ML updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics · Depth: Expert, extended

Summary

Robust Local Polynomial Regression with Similarity Kernels (RSKLPR) is a new nonparametric method designed to enhance the robustness of traditional Local Polynomial Regression (LPR) against outliers and high-leverage points. This framework introduces a novel weighting mechanism that utilizes two positive definite kernels, incorporating both predictor and response variables to estimate weights through localized density estimation. Implemented in Python and available at https://github.com/yaniv-shulman/rsklpr, RSKLPR demonstrates competitive performance in synthetic benchmark experiments. It consistently improves robustness and accuracy compared to standard LPR, particularly in heteroscedastic and noisy environments, and achieves these results in a single iteration, unlike some iterative robust variants. The method also exhibits reduced sensitivity to neighborhood size.

Key takeaway

For Machine Learning Engineers or Data Scientists building regression models on noisy or outlier-prone datasets, you should consider adopting Robust Local Polynomial Regression with Similarity Kernels (RSKLPR). This method offers superior robustness and accuracy over standard LPR, especially in heteroscedastic environments, without requiring multiple iterations. You can implement this approach using the publicly available Python package, tuning the K_2 kernel's bandwidth to balance robustness and potential bias.

Key insights

RSKLPR enhances LPR robustness by using similarity kernels that weight data based on both predictor and response variables.

Principles

Incorporate response variables into kernel weighting.
Localized density estimation mitigates outlier impact.
Robustness can introduce asymptotic bias under non-normality.

Method

RSKLPR generalizes LPR's kernel function by defining a compound positive definite kernel, K_D, as a product of K_1 (predictor distance) and K_2 (predictor/response similarity via conditional or joint density estimation). This assigns robust weights to minimize empirical loss.

In practice

Use RSKLPR for robust regression in noisy data.
Implement with Laplacian kernel for K_1.
Consider bandwidth H_2 for robustness-bias trade-off.

Topics

Robust Local Polynomial Regression
Similarity Kernels
Nonparametric Regression
Outlier Robustness
Python Implementation
Kernel Density Estimation

Code references

Best for: AI Scientist, Machine Learning Engineer, Research Scientist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by stat.ML updates on arXiv.org.