What's Missing Between LLMs and AGI - Vishal Misra & Martin Casado

2026-03-17 · Source: The a16z Show · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Mathematics & Computational Sciences · Depth: Advanced, extended

Summary

Vishal Misra's research reveals that large language models (LLMs), specifically transformers, update their predictions through a precise, mathematically predictable Bayesian process. His team developed a "Bayesian wind tunnel" to empirically and mathematically prove this, demonstrating transformers achieve 10^-3 bits accuracy in updating beliefs, outperforming Mamba, LSTMs, and MLPs. While LLMs excel at correlation (Shannon entropy), Misra argues they lack the post-training plasticity and causal understanding (Kolmogorov complexity) necessary for Artificial General Intelligence (AGI). He proposes AGI requires architectures capable of continual learning and moving beyond pattern matching to build causal models, citing the "Einstein test" where an LLM trained on pre-1916 physics would fail to derive relativity.

Key takeaway

For AI scientists and ML engineers designing next-generation systems, recognize that current LLMs, while adept at Bayesian updating and correlation, fundamentally lack post-training plasticity and causal reasoning. Your efforts should prioritize developing architectures that enable continual learning and move beyond Shannon entropy to build true causal models, rather than solely pursuing larger models or more training data, to advance towards AGI.

Key insights

LLMs, particularly transformers, execute precise Bayesian updating, but lack the plasticity and causal reasoning essential for AGI.

Principles

Transformers update predictions via mathematical Bayesian inference.
AGI demands continual learning and causal model building.
Achieving AGI requires architectural shifts, not just scale.

Method

The "Bayesian wind tunnel" tests blank architectures on non-memorizable tasks with analytically known Bayesian posteriors, proving precise updating.

In practice

Utilize Token Probe to visualize LLM probability distributions.
Implement in-context learning with custom DSLs for specific tasks.

Topics

Large Language Models
Bayesian Inference
Artificial General Intelligence
Causal Models
Continual Learning
Transformer Architecture
Machine Learning Theory

Best for: Research Scientist, AI Scientist, Machine Learning Engineer, AI Student

Related on AIssential

See Counsel's argued verdicts on the open AI decisions leaders are weighing →

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by The a16z Show.