CausalMoE: A Billion-Scale Multimodal Foundation Model for Granger Causal Discovery with Pattern-Routed Heterogeneous Experts

· Source: cs.AI updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics · Depth: Expert, extended

Summary

CausalMoE is a billion-scale multimodal foundation model designed for Granger Causal Discovery (GCD) in time series, addressing the limitations of "one-size-fits-all" neural methods that struggle with distribution shifts. It introduces a Pattern-Routed Mixture of Heterogeneous Experts (MoHE) to dynamically identify latent temporal patterns and route time-series patches to specialized domain experts, decoupling regime-specific mechanisms. CausalMoE integrates Large Language Models (LLMs) and Vision-Language Models (VLMs) to align numerical signals with textual and visual priors, regularizing causal estimation. A Causality-Aware Self-Attention mechanism ensures interpretable, sparse graph recovery via proximal optimization. Extensive experiments show CausalMoE achieves state-of-the-art performance on benchmarks like VAR, Lorenz-96, fMRI, DREAM-3, and DREAM-4, demonstrating strong generalization, especially in few-shot settings where traditional methods fail.

Key takeaway

For Machine Learning Engineers developing causal inference systems, CausalMoE offers a robust solution for Granger Causal Discovery, particularly in data-scarce or heterogeneous time series environments. Its ability to leverage multimodal priors and adapt to regime shifts means you can achieve reliable causal structures with significantly less training data, outperforming traditional methods. Consider integrating this approach to enhance the accuracy and interpretability of your temporal causal models.

Key insights

CausalMoE uses multimodal experts and pattern-routed architecture for robust Granger Causal Discovery in heterogeneous time series.

Principles

Method

CausalMoE employs Multimodal Patch Encoding, Patch-Specific Pattern Routing to heterogeneous experts (Semantic, Multimodal, Temporal Frequency, Multiscale Temporal), and Causality-Aware Self-Attention with proximal optimization for sparse graph recovery.

In practice

Topics

Code references

Best for: Research Scientist, AI Scientist, Machine Learning Engineer, Data Scientist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.