Playing Games With TheoryCoder | ARC Prize @ MIT

2025-10-28 · Source: ARC Prize · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Robotics & Autonomous Systems · Depth: Advanced, long

Summary

Sam Gershman's lab at Harvard has developed theory-based reinforcement learning (RL) systems, exemplified by TheoryCoder, to address the efficiency and flexibility challenges in AI learning, particularly within video game environments. This approach aims to build AI systems that learn games similarly to humans by constructing domain-specific intuitive theories that capture causal laws. Theory-based RL, a form of model-based RL, learns underlying causal laws and uses this knowledge for planning and exploration, demonstrating significantly higher learning efficiency than deep learning systems. TheoryCoder specifically tackles challenges in efficiently inferring and utilizing theories by separating low-level game mechanics (Python functions) from high-level abstract predicates and operators (PDDDL), enabling optimized hierarchical planning. The system uses Large Language Models (LLMs) to infer theories from gameplay history, revising them with new observations, and has shown near-perfect success rates with fewer API calls compared to direct LLM planning in games like "Baba Is You" and "Sokoban".

Key takeaway

For AI Scientists and Machine Learning Engineers developing agents for complex, dynamic environments like video games, consider adopting a theory-based reinforcement learning approach. This method, particularly with hierarchical abstraction as demonstrated by TheoryCoder, offers superior learning efficiency and scalability compared to direct LLM planning. Your teams should focus on building systems that infer and revise intermediate causal theories, enabling robust and human-like adaptive behavior with fewer computational resources.

Key insights

Theory-based reinforcement learning with hierarchical abstraction offers scalable, human-like efficiency for complex sequential decision problems.

Principles

Intuitive theories capture causal laws.
Hierarchical abstraction improves planning efficiency.
Intermediate theory construction is crucial.

Method

TheoryCoder infers low-level game mechanics (Python) and high-level abstractions (PDDDL) using LLMs, then combines them for efficient hierarchical planning and theory revision based on new observations.

In practice

Use LLMs for theory inference, not direct planning.
Separate low-level mechanics from high-level abstractions.
Employ hierarchical planning for efficiency.

Topics

Theory-Based Reinforcement Learning
TheoryCoder System
Large Language Models
Hierarchical Planning
Intuitive Theories

Best for: AI Scientist, Machine Learning Engineer, Research Scientist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by ARC Prize.