What's Missing Between LLMs and AGI - Vishal Misra & Martin Casado
Summary
Vishal Misra's research reveals that large language models (LLMs), specifically transformers, update their predictions through a precise, mathematically predictable Bayesian process. His team developed a "Bayesian wind tunnel" to empirically and mathematically prove this, demonstrating transformers achieve 10^-3 bits accuracy in updating beliefs, outperforming Mamba, LSTMs, and MLPs. While LLMs excel at correlation (Shannon entropy), Misra argues they lack the post-training plasticity and causal understanding (Kolmogorov complexity) necessary for Artificial General Intelligence (AGI). He proposes AGI requires architectures capable of continual learning and moving beyond pattern matching to build causal models, citing the "Einstein test" where an LLM trained on pre-1916 physics would fail to derive relativity.
Key takeaway
For AI scientists and ML engineers designing next-generation systems, recognize that current LLMs, while adept at Bayesian updating and correlation, fundamentally lack post-training plasticity and causal reasoning. Your efforts should prioritize developing architectures that enable continual learning and move beyond Shannon entropy to build true causal models, rather than solely pursuing larger models or more training data, to advance towards AGI.
Key insights
LLMs, particularly transformers, execute precise Bayesian updating, but lack the plasticity and causal reasoning essential for AGI.
Principles
- Transformers update predictions via mathematical Bayesian inference.
- AGI demands continual learning and causal model building.
- Achieving AGI requires architectural shifts, not just scale.
Method
The "Bayesian wind tunnel" tests blank architectures on non-memorizable tasks with analytically known Bayesian posteriors, proving precise updating.
In practice
- Utilize Token Probe to visualize LLM probability distributions.
- Implement in-context learning with custom DSLs for specific tasks.
Topics
- Large Language Models
- Bayesian Inference
- Artificial General Intelligence
- Causal Models
- Continual Learning
- Transformer Architecture
- Machine Learning Theory
Best for: Research Scientist, AI Scientist, Machine Learning Engineer, AI Student
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by The a16z Show.