Catching a Moving Subspace: Low-Rank Bandits Beyond Stationarity

2026-05-21 · Source: stat.ML updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics · Depth: Expert, extended

Summary

Real-world bandit deployments, such as recommendation and clinical dosing, often involve rewards on a low-dimensional latent subspace that drifts. The paper "Catching a Moving Subspace: Low-Rank Bandits Beyond Stationarity" presents a novel approach to these piecewise-stationary low-rank linear contextual bandits with scalar feedback. The authors establish three necessary and jointly sufficient probe-side conditions for recovering the moving subspace from quadratic reward functionals. They propose SPSC (Single-Play Subspace-Calibrated Optimism), an algorithm that interleaves isotropic probes with windowed projected ridge-UCB exploitation. SPSC achieves a costed dynamic regret of \u00d5(r\u221aT)+\u00d5(T^2/3)+O(W\u2009V_in), replacing the ambient \u00d5(d\u221aT) rate with the intrinsic rank r. Empirical evaluations across eleven benchmarks, including synthetic, UCI/MovieLens, clinical, and ZOZOTOWN production data, demonstrate SPSC's superior performance when d-r\u227d T^1/6, aligning with theoretical predictions.

Key takeaway

For machine learning engineers building high-dimensional sequential decision systems with drifting latent structures, SPSC offers a robust solution. You should consider deploying SPSC when your ambient dimension d significantly exceeds the intrinsic rank r, specifically if d-r \u227d T^1/6. This method provides substantial regret reductions by adapting to changing low-rank subspaces, outperforming traditional non-stationary and fixed low-rank approaches. It's also resilient to mild overestimation of the true rank.

Key insights

Scalar rewards enable quadratic measurements for changing low-rank subspace recovery in non-stationary bandits.

Principles

Scalar rewards yield quadratic measurements of the second moment.
Subspace recovery requires known noise variance and full probe support.
Probe acquisition cost must be balanced against exploitation regret.

Method

SPSC interleaves isotropic probes with windowed projected ridge-UCB exploitation within the learned r-dimensional subspace, using a CUSUM-style detector for online change point discovery.

In practice

Apply to personalized recommendation systems.
Use for clinical dosing and ad allocation.

Topics

Low-Rank Bandits
Non-Stationary Learning
Contextual Bandits
Subspace Tracking
Dynamic Regret
Online Algorithms

Code references

HamedKhosravi99/spsc-code

Best for: Research Scientist, AI Scientist, Machine Learning Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by stat.ML updates on arXiv.org.