PatchWorld: Gradient-Free Optimization of Executable World Models

· Source: cs.CL updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Robotics & Autonomous Systems, Software Development & Engineering · Depth: Expert, extended

Summary

PatchWorld is a gradient-free framework that converts offline trajectories into executable Python world models through counterexample-guided code repair. This approach induces symbolic belief-state programs that are inspectable, replayable, and locally patchable, rather than relying on black-box models for next observation prediction. Evaluated across seven AgentGym environments, PatchWorld-Simple achieved the highest code-based planning score, reaching 76.4% macro success in live one-step lookahead without invoking LLM calls within its world-model prediction module. A variant, PatchWorld-Residual, which incorporates a human-specified residual-memory bias, demonstrated superior surface observation fidelity, achieving 0.69 macro Token F1. This reveals a critical trade-off in executable world models: improving observation fidelity can weaken decision utility, and vice versa. The framework's code is publicly available.

Key takeaway

For AI Engineers developing world models for text-based agents, you should consider PatchWorld's gradient-free approach to generate inspectable, executable Python programs. If your priority is high planning utility with zero lookahead LLM calls, opt for PatchWorld-Simple, which achieved 76.4% macro success. Conversely, if observation fidelity is paramount, PatchWorld-Residual offers 0.69 macro Token F1. Recognize the inherent trade-off between these goals and select the variant aligning with your specific application's needs.

Key insights

PatchWorld generates inspectable, executable world models from offline data via gradient-free, counterexample-guided code repair.

Principles

Method

PatchWorld uses an LLM to synthesize an initial Python world model, then iteratively repairs it using counterexample-guided discrete search based on replay failures and a validation gate.

In practice

Topics

Code references

Best for: Research Scientist, AI Scientist, Machine Learning Engineer, AI Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CL updates on arXiv.org.