Plan2Map: A Multimodal Benchmark for Document-Grounded Geospatial Boundary Reconstruction from Planning Records

2026-06-01 · Source: Artificial Intelligence · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Geospatial AI & Computer Vision · Depth: Expert, quick

Summary

Plan2Map is a new 208-case multimodal benchmark designed for document-grounded geospatial boundary reconstruction using UK planning records. This benchmark requires systems to reconstruct a valid geospatial boundary in GeoJSON format solely from a source planning document, which includes notice text, schedules, map plates, map labels, and boundary annotations. Researchers introduce GeoPlanAgent, a document-grounded, geospatial-tool-in-the-loop system that breaks down the task into evidence extraction, localisation, map registration, boundary segmentation, projection, and verification. GeoPlanAgent achieved a 0.736 mean IoU and 0.904 median IoU on Plan2Map, with 67.8% of predictions at or above 0.8 IoU, significantly surpassing direct VLM-to-GeoJSON baselines. Diagnostic analysis indicates direct VLM prediction remains unreliable, with remaining errors concentrated in localisation and map registration, while supervised boundary segmentation enhances pixel-level mask quality.

Key takeaway

For Computer Vision Engineers developing systems for geospatial boundary reconstruction from planning records, you should prioritize multimodal approaches that decompose the task into distinct stages. Direct VLM-to-GeoJSON methods are unreliable; instead, focus your efforts on improving localisation and map registration components, as these are critical error sources. Consider implementing supervised boundary segmentation to significantly enhance pixel-level mask quality in your solutions.

Key insights

Plan2Map enables multimodal geospatial boundary reconstruction from diverse planning documents via structured processing.

Principles

Direct VLM prediction is unreliable for geospatial boundaries.
Decompose complex multimodal tasks for better results.
Supervised segmentation improves pixel-level mask quality.

Method

GeoPlanAgent reconstructs boundaries by decomposing the task into evidence extraction, localisation, map registration, boundary segmentation, projection, and verification, integrating geospatial tools.

In practice

Integrate diverse document elements for geospatial tasks.
Adopt a multi-stage pipeline for document-grounded reconstruction.
Prioritize improving localisation and map registration steps.

Topics

Plan2Map Benchmark
Geospatial Boundary Reconstruction
Multimodal AI
Document Analysis
Computer Vision
GeoJSON

Best for: AI Scientist, Computer Vision Engineer, Research Scientist

Related on AIssential

See Counsel's argued verdicts on the open AI decisions leaders are weighing →

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.