State-of-art minibatches via novel DPP kernels: discretization, wavelets, and rough objectives

2026-05-14 · Source: stat.ML updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Mathematics & Computational Sciences · Depth: Expert, extended

Summary

This research introduces novel Determinantal Point Processes (DPPs) based on wavelets for generating efficient minibatches and coresets in machine learning, addressing challenges in constructing DPPs with strong variance reduction properties and converting continuous DPPs to discrete kernels. The authors propose new wavelet-based DPPs on Euclidean space that offer provably better accuracy guarantees, specifically achieving a standard error of $n^{-(1/2+1/d)}$ for $C^1$ functions, outperforming previous rates of $n^{-(1/2+1/(2d))}$ for multivariate orthogonal polynomial ensembles (OPEs). A general pipeline is presented to convert these continuous DPPs into discrete kernels, preserving variance decay and revealing a low-rank decomposition for computationally inexpensive sampling. This method extends DPP-based improvements to ML tasks with arbitrarily low regularity objective functions, demonstrating superior performance in k-means coreset construction and stochastic gradient descent (SGD) with non-smooth hinge loss on synthetic trimodal and MNIST datasets.

Key takeaway

For research scientists developing or applying subsampling techniques in machine learning, this work provides a robust framework for improving minibatch and coreset efficiency. You should explore integrating wavelet-based DPPs, particularly the Haar or Daubechies-2 variants, into your sampling strategies. This approach offers superior variance reduction and computational efficiency, especially beneficial when dealing with non-smooth objective functions or rough data, potentially leading to faster convergence and more accurate models in applications like k-means and SGD.

Key insights

Wavelet-based DPPs and a novel discretization pipeline significantly enhance minibatch and coreset sampling efficiency for diverse ML tasks.

Principles

Continuous DPPs are more analytically tractable.
Variance decay rates can adapt to function regularity.
Low-rank decomposition enables efficient DPP sampling.

Method

A general pipeline converts continuous wavelet-based DPPs into discrete kernels, preserving variance decay and providing a low-rank decomposition for efficient sampling, applicable even to non-smooth objective functions.

In practice

Use wavelet DPPs for k-means coreset construction.
Apply wavelet DPPs to SGD with non-smooth loss functions.
Consider Haar or Daubechies-2 wavelets for improved performance.

Topics

Determinantal Point Processes
Wavelet Kernels
Continuous-to-Discrete Kernel Conversion
Variance Reduction
Minibatch Optimization

Code references

PyWavelets/pywt

Best for: Research Scientist, AI Scientist, Machine Learning Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by stat.ML updates on arXiv.org.