Generative Floor Plan Design with LLMs via Reinforcement Learning with Verifiable Rewards

2026-05-15 · Source: cs.CL updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Architecture & Urban Planning · Depth: Expert, extended

Summary

Mila researchers introduce a text-based floor plan generation approach that fine-tunes a large language model (LLM) on real plans and then applies reinforcement learning with verifiable rewards (RLVR) to improve adherence to topological and numerical constraints. This method addresses limitations of existing generative models, which often struggle with precise control over room dimensions and areas while respecting connectivity. The approach uses a JSON-based representation for both input (bubble diagrams and design requirements like room sizes) and output (polygonal room layouts), facilitating integration with CAD tools. The model, built on Llama-3.3-70B-Instruct, undergoes two-stage training: supervised fine-tuning followed by RLVR. It significantly outperforms baselines like House-GAN++ and HouseDiffusion across metrics such as Compatibility (94% relative reduction), Area Adherence, Overlap, Aspect Ratio Adherence, Room Count Adherence, Realism, and Diversity.

Key takeaway

For AI Engineers developing generative design tools, this research demonstrates a robust method for integrating strict numerical and topological constraints into LLM-based generation. You should consider adopting a two-stage fine-tuning approach with RLVR and structured JSON inputs to achieve high fidelity and adherence to user specifications, especially for applications requiring precise geometric control and CAD compatibility.

Key insights

LLMs can generate highly constrained, structured floor plans by combining supervised fine-tuning with reinforcement learning and verifiable rewards.

Principles

Structured data input/output improves generative model reliability.
RL with verifiable rewards enhances constraint adherence.
Human feedback can measure subjective qualities like realism.

Method

A two-stage fine-tuning process for LLMs: first, supervised learning on real floor plans, then reinforcement learning (GRPO) with a reward function incorporating constraint adherence metrics and a hard feasibility condition for valid outputs.

In practice

Use JSON for unambiguous input/output in generative design.
Implement graph edit distance for connectivity validation.
Filter RPLAN dataset for valid polygons and connected graphs.

Topics

Generative Floor Plan Design
Large Language Models
Reinforcement Learning with Verifiable Rewards
JSON Floor Plan Representation
Constraint Adherence Metrics

Code references

Best for: AI Engineer, NLP Engineer, Research Scientist, AI Scientist, Machine Learning Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CL updates on arXiv.org.