Forecasting Is Not Attribution: Localizing Decoder Bypass in Graph-Based Neural Marketing Mix Models

· Source: Machine Learning · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics, Marketing, Branding & Advertising · Depth: Expert, quick

Summary

Graph-based neural Marketing Mix Models (MMMs) often conflate forecasting accuracy with attribution, leading to a failure mode called attribution bypass. This occurs when high-capacity decoders achieve low forecasting error, such as MSE@7 around 0.004, via target autoregression or dense communication, without properly routing counterfactual sensitivity through the attribution graph. Researchers introduce DICE-MMM, a bounded diagnostic and training framework designed to separate graph recovery, forecasting accuracy, and graph-aligned perturbation influence. DICE-MMM trains a graph encoder with a restricted decoder in Stage 1, then freezes it to train a graph-safe latent decoder in Stage 2. Evaluated with CIG, AR-CIG, and graph-swap tests, DICE improves stable graph recovery over CausalMMM. Experiments demonstrate that low MSE can mask attribution bypass, with AR-CIG nAUPRC near zero despite low MSE, whereas an oracle graph achieves 0.807 +/- 0.129 nAUPRC. The core bottleneck is identified as graph-support selection, not forecasting or decoder capacity.

Key takeaway

For data scientists building or evaluating graph-based neural Marketing Mix Models, relying solely on low forecasting error is insufficient for validating attribution. Your models might be experiencing "attribution bypass," where high accuracy masks incorrect channel influence. You should implement diagnostic frameworks like DICE-MMM and utilize tests such as CIG, AR-CIG, and graph-swap to ensure your decoder's sensitivity is truly graph-aligned, focusing on robust graph-support selection to prevent misleading insights.

Key insights

Forecasting accuracy in neural MMMs does not guarantee correct attribution, often hiding a "decoder bypass" failure.

Principles

Method

DICE-MMM trains a graph encoder with a restricted decoder, then freezes it to train a graph-safe latent decoder whose communication must pass through the supplied graph.

In practice

Topics

Best for: Research Scientist, AI Scientist, Data Scientist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Machine Learning.