Dertouzos Distinguished Lecturer: Richard Sutton
Summary
Richard Sutton, a co-architect of modern reinforcement learning and co-author of the foundational "Sutton and Barto" textbook, presented his "OAK" (Options And Knowledge) architecture for achieving superintelligence from experience. This vision adheres to "The Bitter Lesson," which posits that general methods leveraging computation historically outperform approaches encoding human knowledge. OAK extends the consensus agent architecture by integrating "options" for temporal abstraction and "knowledge" as learned beliefs about option consequences. The architecture proposes that agents continuously learn policies, generate new state features, create subproblems from highly ranked features (specifically "reward respecting subproblems" for feature attainment), and learn transition models for these options. Sutton acknowledges that reliable continual deep learning and meta-learning are crucial missing prerequisites for OAK's large-scale realization.
Key takeaway
For AI Scientists and Machine Learning Engineers designing general intelligence systems, you should prioritize architectures that enable agents to discover their own abstractions and skills from raw experience. Avoid extensively building in domain-dependent knowledge, as this approach limits scalability and future progress. Instead, focus on developing reliable continual deep learning and meta-learning capabilities to foster open-ended, autonomous growth in agent complexity and conceptual structures.
Key insights
OAK proposes a domain-independent AI architecture that learns abstractions and skills from experience, scaling with computation.
Principles
- General methods leveraging computation outperform encoded human knowledge ("The Bitter Lesson").
- The world is too complex for agents to model exactly; learning and planning must occur at runtime.
- Intelligence should grow from runtime experience, not special training phases or human-built domain knowledge.
Method
OAK continuously learns policies, generates new state features, creates subproblems from ranked features, learns option solutions and transition models, and plans with these "jumpy" models, all at runtime.
In practice
- Design agents to discover their own abstractions and subproblems.
- Utilize "options" for temporal abstraction in planning.
- Frame subproblems as "reward respecting feature attainment."
Topics
- Reinforcement Learning
- OAK Architecture
- Temporal Abstraction
- The Bitter Lesson
- Continual Deep Learning
- General Intelligence
Best for: Research Scientist, AI Scientist, AI Student, Machine Learning Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by MIT CSAIL.