Forecasting Is Not Attribution: Localizing Decoder Bypass in Graph-Based Neural Marketing Mix Models
Summary
Graph-based neural Marketing Mix Models (MMMs) often conflate forecasting accuracy with attribution, leading to a failure mode called attribution bypass. This occurs when high-capacity decoders achieve low forecasting error, such as MSE@7 around 0.004, via target autoregression or dense communication, without properly routing counterfactual sensitivity through the attribution graph. Researchers introduce DICE-MMM, a bounded diagnostic and training framework designed to separate graph recovery, forecasting accuracy, and graph-aligned perturbation influence. DICE-MMM trains a graph encoder with a restricted decoder in Stage 1, then freezes it to train a graph-safe latent decoder in Stage 2. Evaluated with CIG, AR-CIG, and graph-swap tests, DICE improves stable graph recovery over CausalMMM. Experiments demonstrate that low MSE can mask attribution bypass, with AR-CIG nAUPRC near zero despite low MSE, whereas an oracle graph achieves 0.807 +/- 0.129 nAUPRC. The core bottleneck is identified as graph-support selection, not forecasting or decoder capacity.
Key takeaway
For data scientists building or evaluating graph-based neural Marketing Mix Models, relying solely on low forecasting error is insufficient for validating attribution. Your models might be experiencing "attribution bypass," where high accuracy masks incorrect channel influence. You should implement diagnostic frameworks like DICE-MMM and utilize tests such as CIG, AR-CIG, and graph-swap to ensure your decoder's sensitivity is truly graph-aligned, focusing on robust graph-support selection to prevent misleading insights.
Key insights
Forecasting accuracy in neural MMMs does not guarantee correct attribution, often hiding a "decoder bypass" failure.
Principles
- Forecasting and attribution are distinct goals.
- Low forecasting error can mask attribution failures.
- Graph-support selection is a critical bottleneck.
Method
DICE-MMM trains a graph encoder with a restricted decoder, then freezes it to train a graph-safe latent decoder whose communication must pass through the supplied graph.
In practice
- Use DICE-MMM to diagnose attribution bypass.
- Evaluate decoders with CIG, AR-CIG, graph-swap tests.
- Prioritize robust graph-support selection.
Topics
- Marketing Mix Models
- Attribution Modeling
- Graph Neural Networks
- Model Diagnostics
- Forecasting Accuracy
- DICE-MMM
Best for: Research Scientist, AI Scientist, Data Scientist
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Machine Learning.