2.5-D Decomposition for LLM-Based Spatial Construction

· Source: cs.AI updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Robotics & Autonomous Systems · Depth: Expert, long

Summary

A neuro-symbolic pipeline employing "2.5-D decomposition" significantly improves the accuracy of large language models (LLMs) in spatial construction tasks. This method allows the LLM to plan only in the two-dimensional horizontal plane, while a deterministic executor handles all vertical placements based on column occupancy, effectively eliminating a class of common coordinate errors. On the Build What I Mean (BWIM) benchmark (160 rounds), GPT-4o-mini with this pipeline achieved 94.6% mean structural accuracy, outperforming GPT-4o at 90.3% and the best competing system at 76.3%. An ablation study confirmed 2.5-D decomposition as the dominant contributor, accounting for a 50.7 percentage point accuracy increase. The pipeline also transferred to edge hardware, with Nemotron-3 120B on an NVIDIA Jetson Thor AGX matching cloud results at 94.5% accuracy without prompt modifications. The approach's generalizability was further demonstrated on 500 IGLU collaborative building tasks.

Key takeaway

For Research Scientists developing autonomous construction or assembly systems, you should consider implementing a 2.5-D decomposition strategy. By offloading deterministically computable spatial dimensions (like vertical placement under gravity) from the LLM to a dedicated executor, you can significantly enhance accuracy and reliability, even with smaller, more cost-effective models like GPT-4o-mini or local edge hardware, while reducing systematic coordinate errors.

Key insights

Decomposing LLM spatial tasks by offloading deterministic dimensions to an executor drastically improves accuracy.

Principles

Method

The pipeline involves an LLM planning in 2D (x,z) for block placement, while a deterministic executor computes the vertical (y) coordinate based on column occupancy, along with underspecification handling and peephole prompt optimization.

In practice

Topics

Code references

Best for: Research Scientist, AI Scientist, Machine Learning Engineer, Robotics Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.