PatchWorld: Gradient-Free Optimization of Executable World Models
Summary
PatchWorld is a gradient-free framework that converts offline trajectories into executable Python world models through counterexample-guided code repair. This approach induces symbolic belief-state programs that are inspectable, replayable, and locally patchable, rather than relying on black-box models for next observation prediction. Evaluated across seven AgentGym environments, PatchWorld-Simple achieved the highest code-based planning score, reaching 76.4% macro success in live one-step lookahead without invoking LLM calls within its world-model prediction module. A variant, PatchWorld-Residual, which incorporates a human-specified residual-memory bias, demonstrated superior surface observation fidelity, achieving 0.69 macro Token F1. This reveals a critical trade-off in executable world models: improving observation fidelity can weaken decision utility, and vice versa. The framework's code is publicly available.
Key takeaway
For AI Engineers developing world models for text-based agents, you should consider PatchWorld's gradient-free approach to generate inspectable, executable Python programs. If your priority is high planning utility with zero lookahead LLM calls, opt for PatchWorld-Simple, which achieved 76.4% macro success. Conversely, if observation fidelity is paramount, PatchWorld-Residual offers 0.69 macro Token F1. Recognize the inherent trade-off between these goals and select the variant aligning with your specific application's needs.
Key insights
PatchWorld generates inspectable, executable world models from offline data via gradient-free, counterexample-guided code repair.
Principles
- Executable world models offer interpretability.
- Fidelity and planning utility can diverge.
- LLMs can act as symbolic optimizers.
Method
PatchWorld uses an LLM to synthesize an initial Python world model, then iteratively repairs it using counterexample-guided discrete search based on replay failures and a validation gate.
In practice
- Use PatchWorld for text-agent environment simulation.
- Apply residual memory for high observation fidelity.
- Prioritize planning utility over exact rendering.
Topics
- Executable World Models
- Gradient-Free Optimization
- Counterexample-Guided Repair
- Text-Agent Environments
- Large Language Models
- Planning Utility
- Observation Fidelity
Code references
Best for: Research Scientist, AI Scientist, Machine Learning Engineer, AI Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CL updates on arXiv.org.