Stochastic Linear Contextual Bandits with Bounded Noise: A Set-Membership Approach

2026-06-19 · Source: stat.ML updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Mathematics & Computational Sciences · Depth: Expert, extended

Summary

A novel algorithm, SME-OFU, addresses Stochastic Linear Contextual Bandits (SLCB) by explicitly leveraging bounded reward noise, a condition common in applications like personalized recommendations and mobile health. Unlike existing methods that assume sub-Gaussian noise and achieve an ~O(sqrt(T)) regret bound, SME-OFU employs Set-Membership Estimation (SME) for uncertainty quantification within an Optimism in the Face of Uncertainty (OFU) framework. This approach yields a significantly improved, instance-independent regret bound of O(log T), specifically O(d^3 log(dT/delta)). The algorithm's theoretical analysis required a new convex geometry-based proof framework due to SME's polytopic uncertainty sets. Empirical simulations demonstrate SME-OFU's superior performance over sub-Gaussian noise benchmarks when reward noise is indeed bounded.

Key takeaway

For Machine Learning Engineers developing online decision-making systems like personalized recommenders or mobile health interventions where reward noise is inherently bounded, you should re-evaluate standard contextual bandit algorithms. Instead of relying on sub-Gaussian noise assumptions, consider SME-OFU or similar Set-Membership Estimation (SME)-based approaches. This can significantly improve your system's long-term performance by achieving an O(log T) regret, a substantial gain over the ~O(sqrt(T)) bounds of traditional methods, provided you have a valid noise upper bound.

Key insights

Explicitly leveraging bounded reward noise in contextual bandits can yield logarithmic regret, significantly outperforming square-root bounds.

Principles

Bounded noise offers stronger information than sub-Gaussian assumptions.
Set-membership estimation (SME) quantifies uncertainty effectively with bounded noise.
Optimism in the Face of Uncertainty (OFU) guides arm selection using uncertainty sets.

Method

The SME-OFU algorithm iteratively constructs polytopic uncertainty sets for the unknown parameter using historical data and a known noise bound, then selects the arm maximizing expected reward over this set.

In practice

Apply to personalized recommendation systems with binary click rewards.
Use in mobile health for bounded metrics like sedentary time or walking steps.

Topics

Stochastic Linear Contextual Bandits
Bounded Noise
Set-Membership Estimation
Optimism in the Face of Uncertainty
Regret Bounds
Convex Geometry

Best for: Research Scientist, AI Scientist, Machine Learning Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by stat.ML updates on arXiv.org.