PatchWorld: Gradient-Free Optimization of Executable World Models

2022-06-27 · Source: cs.CL updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Robotics & Autonomous Systems, Software Development & Engineering · Depth: Expert, extended

Summary

PatchWorld is a gradient-free framework that converts offline trajectories into executable Python world models through counterexample-guided code repair. This approach induces symbolic belief-state programs that are inspectable, replayable, and locally patchable, rather than relying on black-box models for next observation prediction. Evaluated across seven AgentGym environments, PatchWorld-Simple achieved the highest code-based planning score, reaching 76.4% macro success in live one-step lookahead without invoking LLM calls within its world-model prediction module. A variant, PatchWorld-Residual, which incorporates a human-specified residual-memory bias, demonstrated superior surface observation fidelity, achieving 0.69 macro Token F1. This reveals a critical trade-off in executable world models: improving observation fidelity can weaken decision utility, and vice versa. The framework's code is publicly available.

Key takeaway

For AI Engineers developing world models for text-based agents, you should consider PatchWorld's gradient-free approach to generate inspectable, executable Python programs. If your priority is high planning utility with zero lookahead LLM calls, opt for PatchWorld-Simple, which achieved 76.4% macro success. Conversely, if observation fidelity is paramount, PatchWorld-Residual offers 0.69 macro Token F1. Recognize the inherent trade-off between these goals and select the variant aligning with your specific application's needs.

Key insights

PatchWorld generates inspectable, executable world models from offline data via gradient-free, counterexample-guided code repair.

Principles

Executable world models offer interpretability.
Fidelity and planning utility can diverge.
LLMs can act as symbolic optimizers.

Method

PatchWorld uses an LLM to synthesize an initial Python world model, then iteratively repairs it using counterexample-guided discrete search based on replay failures and a validation gate.

In practice

Use PatchWorld for text-agent environment simulation.
Apply residual memory for high observation fidelity.
Prioritize planning utility over exact rendering.

Topics

Executable World Models
Gradient-Free Optimization
Counterexample-Guided Repair
Text-Agent Environments
Large Language Models
Planning Utility
Observation Fidelity

Code references

HKBU-KnowComp/PatchWorld

Best for: Research Scientist, AI Scientist, Machine Learning Engineer, AI Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CL updates on arXiv.org.