Playing Games With TheoryCoder | ARC Prize @ MIT
Summary
Sam Gershman's lab at Harvard has developed theory-based reinforcement learning (RL) systems, exemplified by TheoryCoder, to address the efficiency and flexibility challenges in AI learning, particularly within video game environments. This approach aims to build AI systems that learn games similarly to humans by constructing domain-specific intuitive theories that capture causal laws. Theory-based RL, a form of model-based RL, learns underlying causal laws and uses this knowledge for planning and exploration, demonstrating significantly higher learning efficiency than deep learning systems. TheoryCoder specifically tackles challenges in efficiently inferring and utilizing theories by separating low-level game mechanics (Python functions) from high-level abstract predicates and operators (PDDDL), enabling optimized hierarchical planning. The system uses Large Language Models (LLMs) to infer theories from gameplay history, revising them with new observations, and has shown near-perfect success rates with fewer API calls compared to direct LLM planning in games like "Baba Is You" and "Sokoban".
Key takeaway
For AI Scientists and Machine Learning Engineers developing agents for complex, dynamic environments like video games, consider adopting a theory-based reinforcement learning approach. This method, particularly with hierarchical abstraction as demonstrated by TheoryCoder, offers superior learning efficiency and scalability compared to direct LLM planning. Your teams should focus on building systems that infer and revise intermediate causal theories, enabling robust and human-like adaptive behavior with fewer computational resources.
Key insights
Theory-based reinforcement learning with hierarchical abstraction offers scalable, human-like efficiency for complex sequential decision problems.
Principles
- Intuitive theories capture causal laws.
- Hierarchical abstraction improves planning efficiency.
- Intermediate theory construction is crucial.
Method
TheoryCoder infers low-level game mechanics (Python) and high-level abstractions (PDDDL) using LLMs, then combines them for efficient hierarchical planning and theory revision based on new observations.
In practice
- Use LLMs for theory inference, not direct planning.
- Separate low-level mechanics from high-level abstractions.
- Employ hierarchical planning for efficiency.
Topics
- Theory-Based Reinforcement Learning
- TheoryCoder System
- Large Language Models
- Hierarchical Planning
- Intuitive Theories
Best for: AI Scientist, Machine Learning Engineer, Research Scientist
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by ARC Prize.