SceneCode: Executable World Programs for Editable Indoor Scenes with Articulated Objects

2026-05-21 · Source: cs.AI updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Robotics & Autonomous Systems, Software Development & Engineering · Depth: Expert, extended

Summary

SceneCode is a framework that generates physically interactable indoor scenes by compiling natural language prompts into executable, code-driven worlds, rather than static meshes. Developed by researchers from The Chinese University of Hong Kong, Shanghai Jiao Tong University, Shanghai AI Laboratory, Microsoft, and the University of Oxford, it addresses limitations in object-level controllability and on-demand asset production. The system uses a room-level agent to plan layouts and emit AssetRequests, which are then routed to one of five code-generation strategies to synthesize part-wise Blender Python programs. These programs are validated through an execution-guided repair-and-refine loop, compiled into simulation-ready SDF assets, and registered in a persistent scene-state registry. SceneCode improves prompt-faithful scene generation, produces assets with cleaner mesh structures, and includes simulator-loadable articulation metadata, outperforming baselines like SceneSmith, HSM, and LayoutVLM in semantic fidelity and physical usability.

Key takeaway

For Robotics Engineers or Embodied AI developers building simulation environments, SceneCode offers a significant advancement by enabling the programmatic generation of physically interactable and editable indoor scenes. You can now create custom articulated objects with specific attributes on demand, overcoming limitations of fixed asset libraries. This allows for more diverse and realistic training environments, where objects retain explicit part structure and simulation-ready articulation, directly supporting complex robot manipulation tasks and policy evaluation.

Key insights

SceneCode generates editable, interactable indoor scenes as executable code, not static meshes.

Principles

Programmatic generation enables on-demand interactable assets.
Execution-guided validation improves code reliability.
Code representation couples visual, semantic, and simulation data.

Method

SceneCode converts natural language prompts into structured house layouts and AssetRequests via a planner–designer–critic loop. Requests are routed to VLM-based strategies, generating part-wise Blender Python programs validated by an execution-guided repair-and-refine loop, then compiled into simulation-ready SDF assets.

In practice

Generate custom articulated objects with specific materials.
Edit object parameters (e.g., leaf count) in Blender Python.
Integrate generated assets directly into MuJoCo for robot interaction.

Topics

Indoor Scene Synthesis
Programmatic World Generation
Blender Python
Articulated Objects
Embodied AI
Robotic Manipulation
Simulation Environments

Best for: Research Scientist, AI Scientist, Robotics Engineer, Machine Learning Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.