SceneCode: Executable World Programs for Editable Indoor Scenes with Articulated Objects

· Source: Artificial Intelligence · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Robotics & Autonomous Systems, Software Development & Engineering · Depth: Expert, quick

Summary

SceneCode is a novel framework that transforms natural language prompts into executable, code-driven indoor worlds, addressing limitations of current scene synthesis pipelines that rely on static meshes and curated asset libraries. It formulates physically interactable indoor scene synthesis as programmatic world generation. The system's room-level agentic backbone first converts a prompt into a structured house layout, generating per-object AssetRequests through a planner-designer-critic loop. These requests are then routed to one of five code-generation strategies, producing part-wise Blender Python programs. An execution-guided repair-and-refine loop validates these programs, which are subsequently compiled into simulation-ready assets and exported as SDF for physics simulation. A persistent scene-state registry ensures traceability and local editability. Evaluations across scene-level synthesis, asset quality, human judgment, and robot interaction demonstrate that SceneCode improves prompt-faithful generation and yields assets with cleaner mesh structures and simulator-loadable articulation metadata.

Key takeaway

For Robotics Engineers or AI Scientists developing embodied AI or robotic manipulation simulations, SceneCode offers a critical advancement in scene generation. If you require dynamic, editable indoor environments with custom, interactable objects, you should explore this programmatic world generation framework. It enables on-demand asset creation and cleaner mesh structures, significantly improving the fidelity and controllability of your simulation-based policy evaluations. This approach streamlines the creation of complex, physically interactable scenes.

Key insights

SceneCode generates editable, physically interactable indoor scenes from natural language using executable code.

Principles

Method

A room-level agentic backbone converts prompts to layouts, generating AssetRequests. These route to code-gen strategies, producing Blender Python programs validated by a repair loop, then compiled to SDF.

In practice

Topics

Best for: Research Scientist, Robotics Engineer, AI Scientist, AI Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.