CausalMoE: A Billion-Scale Multimodal Foundation Model for Granger Causal Discovery with Pattern-Routed Heterogeneous Experts

2026-06-12 · Source: cs.AI updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics · Depth: Expert, extended

Summary

CausalMoE is a billion-scale multimodal foundation model designed for Granger Causal Discovery (GCD) in time series, addressing the limitations of "one-size-fits-all" neural methods that struggle with distribution shifts. It introduces a Pattern-Routed Mixture of Heterogeneous Experts (MoHE) to dynamically identify latent temporal patterns and route time-series patches to specialized domain experts, decoupling regime-specific mechanisms. CausalMoE integrates Large Language Models (LLMs) and Vision-Language Models (VLMs) to align numerical signals with textual and visual priors, regularizing causal estimation. A Causality-Aware Self-Attention mechanism ensures interpretable, sparse graph recovery via proximal optimization. Extensive experiments show CausalMoE achieves state-of-the-art performance on benchmarks like VAR, Lorenz-96, fMRI, DREAM-3, and DREAM-4, demonstrating strong generalization, especially in few-shot settings where traditional methods fail.

Key takeaway

For Machine Learning Engineers developing causal inference systems, CausalMoE offers a robust solution for Granger Causal Discovery, particularly in data-scarce or heterogeneous time series environments. Its ability to leverage multimodal priors and adapt to regime shifts means you can achieve reliable causal structures with significantly less training data, outperforming traditional methods. Consider integrating this approach to enhance the accuracy and interpretability of your temporal causal models.

Key insights

CausalMoE uses multimodal experts and pattern-routed architecture for robust Granger Causal Discovery in heterogeneous time series.

Principles

Explicitly model patch-level heterogeneity for reliable causal discovery.
Multimodal priors from LLMs/VLMs regularize causal estimation.
Causality-Aware Self-Attention yields sparse, interpretable causal graphs.

Method

CausalMoE employs Multimodal Patch Encoding, Patch-Specific Pattern Routing to heterogeneous experts (Semantic, Multimodal, Temporal Frequency, Multiscale Temporal), and Causality-Aware Self-Attention with proximal optimization for sparse graph recovery.

In practice

Integrate LLMs and VLMs to enrich time series representations.
Apply Mixture of Experts for adaptive modeling of temporal heterogeneity.
Use variable-wise attention for direct causal interpretation.

Topics

Granger Causal Discovery
Multimodal Foundation Models
Time Series Analysis
Mixture-of-Experts
Large Language Models
Vision-Language Models

Code references

liubolab/CausalMoE

Best for: Research Scientist, AI Scientist, Machine Learning Engineer, Data Scientist

Related on AIssential

See Counsel's argued verdicts on the open AI decisions leaders are weighing →

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.