SceneCode: Executable World Programs for Editable Indoor Scenes with Articulated Objects
Summary
SceneCode is a framework that generates physically interactable indoor scenes by compiling natural language prompts into executable, code-driven worlds, rather than static meshes. Developed by researchers from The Chinese University of Hong Kong, Shanghai Jiao Tong University, Shanghai AI Laboratory, Microsoft, and the University of Oxford, it addresses limitations in object-level controllability and on-demand asset production. The system uses a room-level agent to plan layouts and emit AssetRequests, which are then routed to one of five code-generation strategies to synthesize part-wise Blender Python programs. These programs are validated through an execution-guided repair-and-refine loop, compiled into simulation-ready SDF assets, and registered in a persistent scene-state registry. SceneCode improves prompt-faithful scene generation, produces assets with cleaner mesh structures, and includes simulator-loadable articulation metadata, outperforming baselines like SceneSmith, HSM, and LayoutVLM in semantic fidelity and physical usability.
Key takeaway
For Robotics Engineers or Embodied AI developers building simulation environments, SceneCode offers a significant advancement by enabling the programmatic generation of physically interactable and editable indoor scenes. You can now create custom articulated objects with specific attributes on demand, overcoming limitations of fixed asset libraries. This allows for more diverse and realistic training environments, where objects retain explicit part structure and simulation-ready articulation, directly supporting complex robot manipulation tasks and policy evaluation.
Key insights
SceneCode generates editable, interactable indoor scenes as executable code, not static meshes.
Principles
- Programmatic generation enables on-demand interactable assets.
- Execution-guided validation improves code reliability.
- Code representation couples visual, semantic, and simulation data.
Method
SceneCode converts natural language prompts into structured house layouts and AssetRequests via a planner–designer–critic loop. Requests are routed to VLM-based strategies, generating part-wise Blender Python programs validated by an execution-guided repair-and-refine loop, then compiled into simulation-ready SDF assets.
In practice
- Generate custom articulated objects with specific materials.
- Edit object parameters (e.g., leaf count) in Blender Python.
- Integrate generated assets directly into MuJoCo for robot interaction.
Topics
- Indoor Scene Synthesis
- Programmatic World Generation
- Blender Python
- Articulated Objects
- Embodied AI
- Robotic Manipulation
- Simulation Environments
Best for: Research Scientist, AI Scientist, Robotics Engineer, Machine Learning Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.