Why Scale Will Not Solve AGI | Vishal Misra - The a16z Show

· Source: a16z · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Mathematics & Computational Sciences · Depth: Expert, extended

Summary

Vishal Misra's research models Large Language Models (LLMs) as a giant, sparse matrix representing token probability distributions, explaining how they perform in-context learning through real-time Bayesian updating. Initially demonstrated empirically with GPT-3 for a cricket database query problem (deployed at ESPN in 2020), his subsequent "Bayesian wind tunnel" research mathematically proved that Transformer architectures precisely achieve Bayesian posteriors, unlike LSTMs or MLPs. Misra argues that while LLMs excel at correlation (Shannon entropy), they lack human-like plasticity (frozen weights post-training) and causal reasoning (Kolmogorov complexity, simulation), which are crucial for true Artificial General Intelligence (AGI) as exemplified by the "Einstein test." Future AGI development requires addressing continual learning and moving from correlation to causation, potentially building on Judea Pearl's causal hierarchy.

Key takeaway

LLMs, particularly Transformers, are mathematically proven to perform precise Bayesian updating for in-context learning. A "Bayesian wind tunnel" approach demonstrated Transformers matching true posterior distributions with $10^{-3}$ bits accuracy, outperforming LSTMs and MLPs. This clarifies LLM's correlational (Shannon entropy) learning, but indicates AGI requires new architectures for plasticity and causal reasoning (Kolmogorov complexity) beyond current scaling efforts.

Topics

Best for: Research Scientist, AI Researcher, AI Scientist, AI Student

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by a16z.