2.5-D Decomposition for LLM-Based Spatial Construction
Summary
A neuro-symbolic pipeline employing "2.5-D decomposition" significantly improves the accuracy of large language models (LLMs) in spatial construction tasks. This method allows the LLM to plan only in the two-dimensional horizontal plane, while a deterministic executor handles all vertical placements based on column occupancy, effectively eliminating a class of common coordinate errors. On the Build What I Mean (BWIM) benchmark (160 rounds), GPT-4o-mini with this pipeline achieved 94.6% mean structural accuracy, outperforming GPT-4o at 90.3% and the best competing system at 76.3%. An ablation study confirmed 2.5-D decomposition as the dominant contributor, accounting for a 50.7 percentage point accuracy increase. The pipeline also transferred to edge hardware, with Nemotron-3 120B on an NVIDIA Jetson Thor AGX matching cloud results at 94.5% accuracy without prompt modifications. The approach's generalizability was further demonstrated on 500 IGLU collaborative building tasks.
Key takeaway
For Research Scientists developing autonomous construction or assembly systems, you should consider implementing a 2.5-D decomposition strategy. By offloading deterministically computable spatial dimensions (like vertical placement under gravity) from the LLM to a dedicated executor, you can significantly enhance accuracy and reliability, even with smaller, more cost-effective models like GPT-4o-mini or local edge hardware, while reducing systematic coordinate errors.
Key insights
Decomposing LLM spatial tasks by offloading deterministic dimensions to an executor drastically improves accuracy.
Principles
- Remove deterministic dimensions from LLM output space.
- Physical constraints can define deterministic dimensions.
- Systematic LLM errors can be corrected with peephole optimization.
Method
The pipeline involves an LLM planning in 2D (x,z) for block placement, while a deterministic executor computes the vertical (y) coordinate based on column occupancy, along with underspecification handling and peephole prompt optimization.
In practice
- Use 2.5-D decomposition for gravity-constrained construction.
- Implement peephole optimization for systematic LLM errors.
- Prioritize color over count for clarification questions.
Topics
- 2.5-D Decomposition
- LLM Spatial Reasoning
- Neuro-Symbolic Architecture
- Build What I Mean Benchmark
- Peephole Prompt Optimization
Code references
Best for: Research Scientist, AI Scientist, Machine Learning Engineer, Robotics Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.