Eradicating Negative Transfer in Multi-Physics Foundation Models via Sparse Mixture-of-Experts Routing

2026-05-14 · Source: Machine Learning · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Mathematics & Computational Sciences, Physical Sciences & Chemistry · Depth: Expert, quick

Summary

Shodh-MoE is a novel sparse Mixture-of-Experts (MoE) latent transformer architecture designed to mitigate negative transfer in multi-physics foundation models, a common bottleneck in scaling Scientific Machine Learning (SciML). This architecture addresses the challenge of co-training disparate partial differential equation (PDE) regimes, such as open-channel fluid dynamics and porous media flows, which typically lead to gradient conflict and unstable optimization in dense neural operators. Shodh-MoE processes compressed 16^3 physical latents generated by a physics-informed autoencoder, ensuring exact mass conservation with a velocity divergence of ~2.8 x 10^-10 on 128^3 grids. Its Top-1 soft-semantic router dynamically assigns latent patches to specialized expert subnetworks, allowing for distinct parameter paths for different physical mechanisms while retaining shared experts for universal symmetries. During a 20,000-step pretraining run, the model demonstrated autonomous domain bifurcation, with open-channel tokens routing to Expert 0 and porous-media tokens to Expert 1, achieving simultaneous convergence and low validation MSEs.

Key takeaway

For AI Scientists developing universal foundation models for scientific machine learning, Shodh-MoE offers a robust architectural solution to negative transfer. Your models can achieve simultaneous convergence across disparate physical regimes by implementing sparse Mixture-of-Experts routing and physics-informed latent representations. Consider integrating dynamic routing mechanisms to autonomously specialize parameter paths for distinct physical phenomena, enhancing model stability and accuracy.

Key insights

Sparse Mixture-of-Experts routing effectively mitigates negative transfer in multi-physics foundation models.

Principles

Disparate PDE regimes create gradient conflict.
Specialized experts can handle distinct physics.
Shared experts preserve universal symmetries.

Method

Shodh-MoE uses a physics-informed autoencoder for 16^3 latents and a Top-1 soft-semantic router to dynamically assign latent patches to specialized expert subnetworks, ensuring mass conservation.

In practice

Use sparse MoE for multi-physics problems.
Employ physics-informed autoencoders for latent compression.
Implement dynamic routing for domain bifurcation.

Topics

Multi-Physics Foundation Models
Negative Transfer
Shodh-MoE
Sparse Mixture-of-Experts
Scientific Machine Learning

Best for: AI Scientist, Machine Learning Engineer, Research Scientist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Machine Learning.