Budget-Constrained Causal Bandits: Bridging Uplift Modeling and Sequential Decision-Making

2026-04-30 · Source: stat.ML updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics, E-commerce & Digital Commerce · Depth: Expert, extended

Summary

Budget-Constrained Causal Bandits (BCCB) is a novel online framework designed for treatment allocation in digital advertising, specifically addressing cold-start scenarios where historical data is scarce. Unlike traditional two-stage offline pipelines or end-to-end Decision-Focused Learning methods that require substantial pre-collected data, BCCB learns individual-level ad effectiveness, explores uncertain user responses, and paces budget spending simultaneously, making decisions one user at a time. Evaluated on the Criteo Uplift dataset, BCCB demonstrates a data-efficiency crossover, operating effectively from the first user while offline methods require approximately 10,000 observations for reliable results. BCCB also exhibits 3-5x lower performance variance compared to offline methods, offering more predictable outcomes for campaign planning, and consistently outperforms other online methods like standard Thompson Sampling and greedy HTE estimation across various budget levels.

Key takeaway

For AI Engineers and Research Scientists developing advertising allocation systems, BCCB offers a robust solution for cold-start scenarios. If your campaigns lack sufficient historical data (fewer than 10,000 observations), adopting BCCB can provide significantly more reliable and stable performance than traditional offline uplift modeling. Consider integrating BCCB's unified approach to HTE learning, exploration, and budget pacing to maximize conversions and ensure predictable budget utilization in dynamic environments.

Key insights

BCCB provides a data-efficient online framework for budget-constrained ad allocation, outperforming offline methods in cold-start scenarios.

Principles

Online learning excels in data-scarce environments.
Unified learning and allocation improves performance.
Budget pacing is critical for sequential decisions.

Method

BCCB unifies online Heterogeneous Treatment Effect (HTE) estimation using two classifiers, Thompson Sampling for exploration via Beta posteriors, and adaptive budget pacing based on remaining budget and horizon into a single sequential decision rule.

In practice

Use BCCB for new ad campaigns or market expansions.
Prioritize online methods when historical data is <10,000 observations.
Expect 3-5x more stable performance with BCCB.

Topics

Budget-Constrained Bandits
Uplift Modeling
Heterogeneous Treatment Effects
Thompson Sampling
Cold-Start Learning

Best for: AI Engineer, Research Scientist, AI Scientist, Machine Learning Engineer, Data Scientist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by stat.ML updates on arXiv.org.