HomeWorld: A Unified Floorplan-to-Furnished Framework for Generating Controllable, Densely Interactive Whole-Home Scenes
Summary
HomeWorld is a unified hierarchical framework designed for generating controllable, densely interactive whole-home scenes, addressing challenges in robot simulation and interior design. It tackles the complexity of indoor layouts and scarce 3D scene data by decomposing synthesis into stages. The framework first curates a large-scale dataset of 300K real residential floorplans to train a large language model, enabling fine-grained, controllable whole-floorplan generation using a K-D tree-based representation. Subsequently, it employs image generation models to draft furniture layouts from multi-level viewpoints and generates layouts for small manipulable objects on various surfaces. A VLM-based refiner iteratively corrects placements, while a 3D generative model allows flexible asset replacement. The pipeline completes scenes with basic physical attributes, surface textures, and lighting setups for embodied AI use. Experiments demonstrate HomeWorld's superior performance in layout diversity and 3D design appeal compared to prior methods. The project will release its 300K floorplan dataset and 5K fully furnished scenes.
Key takeaway
For robotics engineers and AI scientists developing embodied AI, HomeWorld offers a robust solution for generating highly controllable and interactive whole-home scenes. If you are struggling with creating diverse and realistic simulation environments, consider utilizing this framework's hierarchical approach and forthcoming 5K furnished scenes. This can significantly reduce manual scene creation efforts and enhance the fidelity of your AI training and testing.
Key insights
HomeWorld unifies floorplan-to-furnished scene generation using a hierarchical AI framework, enhancing realism and interactivity for embodied AI.
Principles
- Hierarchical decomposition simplifies complex scene generation.
- VLM-based refinement corrects furniture and object placement.
Method
HomeWorld trains an LLM on 300K floorplans for whole-floorplan generation, then uses image generation models for furniture and object layouts, refined by a VLM, and completed with 3D generative models and physical attributes.
In practice
- Generate diverse indoor scenes for robot simulation.
- Synthesize interactive environments for embodied AI.
Topics
- Indoor Scene Generation
- Robot Simulation
- Embodied AI
- Floorplan Synthesis
- Furniture Layout
- VLM Refinement
Best for: Research Scientist, AI Scientist, Machine Learning Engineer, Robotics Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.