Generative Floor Plan Design with LLMs via Reinforcement Learning with Verifiable Rewards

· Source: cs.CL updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Architecture & Urban Planning · Depth: Expert, extended

Summary

Mila researchers introduce a text-based floor plan generation approach that fine-tunes a large language model (LLM) on real plans and then applies reinforcement learning with verifiable rewards (RLVR) to improve adherence to topological and numerical constraints. This method addresses limitations of existing generative models, which often struggle with precise control over room dimensions and areas while respecting connectivity. The approach uses a JSON-based representation for both input (bubble diagrams and design requirements like room sizes) and output (polygonal room layouts), facilitating integration with CAD tools. The model, built on Llama-3.3-70B-Instruct, undergoes two-stage training: supervised fine-tuning followed by RLVR. It significantly outperforms baselines like House-GAN++ and HouseDiffusion across metrics such as Compatibility (94% relative reduction), Area Adherence, Overlap, Aspect Ratio Adherence, Room Count Adherence, Realism, and Diversity.

Key takeaway

For AI Engineers developing generative design tools, this research demonstrates a robust method for integrating strict numerical and topological constraints into LLM-based generation. You should consider adopting a two-stage fine-tuning approach with RLVR and structured JSON inputs to achieve high fidelity and adherence to user specifications, especially for applications requiring precise geometric control and CAD compatibility.

Key insights

LLMs can generate highly constrained, structured floor plans by combining supervised fine-tuning with reinforcement learning and verifiable rewards.

Principles

Method

A two-stage fine-tuning process for LLMs: first, supervised learning on real floor plans, then reinforcement learning (GRPO) with a reward function incorporating constraint adherence metrics and a hard feasibility condition for valid outputs.

In practice

Topics

Code references

Best for: AI Engineer, NLP Engineer, Research Scientist, AI Scientist, Machine Learning Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CL updates on arXiv.org.