Dertouzos Distinguished Lecturer: Richard Sutton

2026-06-22 · Source: MIT CSAIL · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Robotics & Autonomous Systems, Emerging Technologies & Innovation · Depth: Intermediate, extended

Summary

Richard Sutton, a co-architect of modern reinforcement learning and co-author of the foundational "Sutton and Barto" textbook, presented his "OAK" (Options And Knowledge) architecture for achieving superintelligence from experience. This vision adheres to "The Bitter Lesson," which posits that general methods leveraging computation historically outperform approaches encoding human knowledge. OAK extends the consensus agent architecture by integrating "options" for temporal abstraction and "knowledge" as learned beliefs about option consequences. The architecture proposes that agents continuously learn policies, generate new state features, create subproblems from highly ranked features (specifically "reward respecting subproblems" for feature attainment), and learn transition models for these options. Sutton acknowledges that reliable continual deep learning and meta-learning are crucial missing prerequisites for OAK's large-scale realization.

Key takeaway

For AI Scientists and Machine Learning Engineers designing general intelligence systems, you should prioritize architectures that enable agents to discover their own abstractions and skills from raw experience. Avoid extensively building in domain-dependent knowledge, as this approach limits scalability and future progress. Instead, focus on developing reliable continual deep learning and meta-learning capabilities to foster open-ended, autonomous growth in agent complexity and conceptual structures.

Key insights

OAK proposes a domain-independent AI architecture that learns abstractions and skills from experience, scaling with computation.

Principles

General methods leveraging computation outperform encoded human knowledge ("The Bitter Lesson").
The world is too complex for agents to model exactly; learning and planning must occur at runtime.
Intelligence should grow from runtime experience, not special training phases or human-built domain knowledge.

Method

OAK continuously learns policies, generates new state features, creates subproblems from ranked features, learns option solutions and transition models, and plans with these "jumpy" models, all at runtime.

In practice

Design agents to discover their own abstractions and subproblems.
Utilize "options" for temporal abstraction in planning.
Frame subproblems as "reward respecting feature attainment."

Topics

Reinforcement Learning
OAK Architecture
Temporal Abstraction
The Bitter Lesson
Continual Deep Learning
General Intelligence

Best for: Research Scientist, AI Scientist, AI Student, Machine Learning Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by MIT CSAIL.