Code-as-Room: Generating 3D Rooms from Top-Down View Images via Agentic Code Synthesis

· Source: Computer Vision and Pattern Recognition · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Robotics & Autonomous Systems, Computer Vision · Depth: Expert, quick

Summary

Code-as-Room is an MLLM-based agentic framework designed to generate realistic 3D indoor rooms from top-down view images by synthesizing Blender code. This framework addresses limitations in existing text-based methods, which struggle with precise spatial information, and image-conditioned agents that suffer from instability during holistic room generation. Code-as-Room parses a reference top-down image to extract scene elements and their spatial relationships, then generates executable Blender code for geometry, materials, and lighting through a multi-stage pipeline. It incorporates a cross-stage memory module to prevent context forgetting, a common issue in agent-based frameworks. The researchers also introduced a new benchmark for code-based 3D room synthesis, using it to validate the effectiveness of their proposed execution harness against existing agent-based methods.

Key takeaway

For research scientists developing 3D content generation systems, Code-as-Room offers a robust approach to synthesizing complex 3D environments from visual input. You should consider adopting a code-as-representation strategy and integrating cross-stage memory modules to enhance stability and precision in your agent-based MLLM frameworks, particularly for applications requiring detailed spatial control like virtual reality or embodied AI.

Key insights

Code-as-Room generates 3D rooms from top-down images using MLLM-driven Blender code synthesis with a structured execution harness.

Principles

Method

The framework parses a top-down image, extracts scene elements and spatial relationships, then synthesizes Blender code for geometry, materials, and lighting in a multi-stage pipeline.

In practice

Topics

Best for: Research Scientist, AI Scientist, Computer Vision Engineer, Robotics Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Computer Vision and Pattern Recognition.