Deep regression learning from dependent observations with minimum error entropy principle
Summary
This paper introduces a deep regression learning approach for nonparametric regression using strongly mixing observations, a scenario where data points are dependent rather than independent. The method employs deep neural networks (DNNs) combined with the minimum error entropy (MEE) principle, which considers all moments of the error variable, offering robustness against non-Gaussian and heavy-tailed noise, unlike traditional $L_2$ (least squares) loss functions. Two specific estimators are analyzed: the non-penalized deep neural network (NPDNN) and the sparse-penalized deep neural network (SPDNN) predictors. The authors establish upper bounds for the expected excess risk of both estimators over Hölder and composition Hölder function classes. For models with Gaussian error, these MEE-based estimators achieve minimax optimal convergence rates, matching existing lower bounds up to a logarithmic factor. The study highlights that while the error density is assumed known, extending the work to unknown error densities remains a challenge.
Key takeaway
For AI Researchers and Research Scientists working on nonparametric regression with dependent data, this work demonstrates that MEE-based deep neural networks offer a robust alternative to $L_2$ loss, particularly for non-Gaussian or heavy-tailed noise. You should consider implementing NPDNN or SPDNN estimators, as they achieve minimax optimal convergence rates for strongly mixing observations. However, be aware that the current theoretical framework assumes a known error density, which may require further research for practical applications with unknown error distributions.
Key insights
MEE-based deep neural networks achieve minimax optimal rates for nonparametric regression with dependent data.
Principles
- MEE criteria account for all error moments, enhancing robustness.
- Strongly mixing data can achieve optimal convergence rates.
- Sparsity regularization improves DNN performance.
Method
The approach minimizes Shannon's entropy of the error using deep neural networks. It defines NPDNN and SPDNN estimators, establishing excess risk bounds over Hölder and composition Hölder function classes for strongly mixing observations.
In practice
- Use MEE for non-Gaussian or heavy-tailed noise.
- Consider SPDNN for improved robustness and sparsity.
- Apply to autoregressive processes with strong mixing.
Topics
- Deep Neural Networks
- Nonparametric Regression
- Minimum Error Entropy
- Strong Mixing
- Minimax Optimality
Best for: AI Researcher, AI Scientist, Research Scientist
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by stat.ML updates on arXiv.org.