Forecasting Is Not Attribution: Localizing Decoder Bypass in Graph-Based Neural Marketing Mix Models

2026-06-10 · Source: Machine Learning · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics, Marketing, Branding & Advertising · Depth: Expert, quick

Summary

Graph-based neural Marketing Mix Models (MMMs) often conflate forecasting accuracy with attribution, leading to a failure mode called attribution bypass. This occurs when high-capacity decoders achieve low forecasting error, such as MSE@7 around 0.004, via target autoregression or dense communication, without properly routing counterfactual sensitivity through the attribution graph. Researchers introduce DICE-MMM, a bounded diagnostic and training framework designed to separate graph recovery, forecasting accuracy, and graph-aligned perturbation influence. DICE-MMM trains a graph encoder with a restricted decoder in Stage 1, then freezes it to train a graph-safe latent decoder in Stage 2. Evaluated with CIG, AR-CIG, and graph-swap tests, DICE improves stable graph recovery over CausalMMM. Experiments demonstrate that low MSE can mask attribution bypass, with AR-CIG nAUPRC near zero despite low MSE, whereas an oracle graph achieves 0.807 +/- 0.129 nAUPRC. The core bottleneck is identified as graph-support selection, not forecasting or decoder capacity.

Key takeaway

For data scientists building or evaluating graph-based neural Marketing Mix Models, relying solely on low forecasting error is insufficient for validating attribution. Your models might be experiencing "attribution bypass," where high accuracy masks incorrect channel influence. You should implement diagnostic frameworks like DICE-MMM and utilize tests such as CIG, AR-CIG, and graph-swap to ensure your decoder's sensitivity is truly graph-aligned, focusing on robust graph-support selection to prevent misleading insights.

Key insights

Forecasting accuracy in neural MMMs does not guarantee correct attribution, often hiding a "decoder bypass" failure.

Principles

Forecasting and attribution are distinct goals.
Low forecasting error can mask attribution failures.
Graph-support selection is a critical bottleneck.

Method

DICE-MMM trains a graph encoder with a restricted decoder, then freezes it to train a graph-safe latent decoder whose communication must pass through the supplied graph.

In practice

Use DICE-MMM to diagnose attribution bypass.
Evaluate decoders with CIG, AR-CIG, graph-swap tests.
Prioritize robust graph-support selection.

Topics

Marketing Mix Models
Attribution Modeling
Graph Neural Networks
Model Diagnostics
Forecasting Accuracy
DICE-MMM

Best for: Research Scientist, AI Scientist, Data Scientist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Machine Learning.