Catching a Moving Subspace: Low-Rank Bandits Beyond Stationarity
Summary
Real-world bandit deployments, such as recommendation and clinical dosing, often involve rewards on a low-dimensional latent subspace that drifts. The paper "Catching a Moving Subspace: Low-Rank Bandits Beyond Stationarity" presents a novel approach to these piecewise-stationary low-rank linear contextual bandits with scalar feedback. The authors establish three necessary and jointly sufficient probe-side conditions for recovering the moving subspace from quadratic reward functionals. They propose SPSC (Single-Play Subspace-Calibrated Optimism), an algorithm that interleaves isotropic probes with windowed projected ridge-UCB exploitation. SPSC achieves a costed dynamic regret of \u00d5(r\u221aT)+\u00d5(T^2/3)+O(W\u2009V_in), replacing the ambient \u00d5(d\u221aT) rate with the intrinsic rank r. Empirical evaluations across eleven benchmarks, including synthetic, UCI/MovieLens, clinical, and ZOZOTOWN production data, demonstrate SPSC's superior performance when d-r\u227d T^1/6, aligning with theoretical predictions.
Key takeaway
For machine learning engineers building high-dimensional sequential decision systems with drifting latent structures, SPSC offers a robust solution. You should consider deploying SPSC when your ambient dimension d significantly exceeds the intrinsic rank r, specifically if d-r \u227d T^1/6. This method provides substantial regret reductions by adapting to changing low-rank subspaces, outperforming traditional non-stationary and fixed low-rank approaches. It's also resilient to mild overestimation of the true rank.
Key insights
Scalar rewards enable quadratic measurements for changing low-rank subspace recovery in non-stationary bandits.
Principles
- Scalar rewards yield quadratic measurements of the second moment.
- Subspace recovery requires known noise variance and full probe support.
- Probe acquisition cost must be balanced against exploitation regret.
Method
SPSC interleaves isotropic probes with windowed projected ridge-UCB exploitation within the learned r-dimensional subspace, using a CUSUM-style detector for online change point discovery.
In practice
- Apply to personalized recommendation systems.
- Use for clinical dosing and ad allocation.
Topics
- Low-Rank Bandits
- Non-Stationary Learning
- Contextual Bandits
- Subspace Tracking
- Dynamic Regret
- Online Algorithms
Code references
Best for: Research Scientist, AI Scientist, Machine Learning Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by stat.ML updates on arXiv.org.