Code-as-Room: Generating 3D Rooms from Top-Down View Images via Agentic Code Synthesis
Summary
Code-as-Room is an MLLM-based agentic framework designed to generate realistic 3D indoor rooms from top-down view images by synthesizing Blender code. This framework addresses limitations in existing text-based methods, which struggle with precise spatial information, and image-conditioned agents that suffer from instability during holistic room generation. Code-as-Room parses a reference top-down image to extract scene elements and their spatial relationships, then generates executable Blender code for geometry, materials, and lighting through a multi-stage pipeline. It incorporates a cross-stage memory module to prevent context forgetting, a common issue in agent-based frameworks. The researchers also introduced a new benchmark for code-based 3D room synthesis, using it to validate the effectiveness of their proposed execution harness against existing agent-based methods.
Key takeaway
For research scientists developing 3D content generation systems, Code-as-Room offers a robust approach to synthesizing complex 3D environments from visual input. You should consider adopting a code-as-representation strategy and integrating cross-stage memory modules to enhance stability and precision in your agent-based MLLM frameworks, particularly for applications requiring detailed spatial control like virtual reality or embodied AI.
Key insights
Code-as-Room generates 3D rooms from top-down images using MLLM-driven Blender code synthesis with a structured execution harness.
Principles
- Represent 3D rooms with Blender codes.
- Mitigate context forgetting via cross-stage memory.
Method
The framework parses a top-down image, extracts scene elements and spatial relationships, then synthesizes Blender code for geometry, materials, and lighting in a multi-stage pipeline.
In practice
- Use Blender code for precise 3D room generation.
- Employ cross-stage memory in agentic frameworks.
Topics
- Code-as-Room
- 3D Room Synthesis
- MLLM-based Agents
- Blender Code Generation
- Top-Down View Images
Best for: Research Scientist, AI Scientist, Computer Vision Engineer, Robotics Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Computer Vision and Pattern Recognition.