PCFootprint: A Large-Scale Dataset and Benchmark for Vectorized Building Footprint Extraction from Aerial LiDAR Point Clouds

2026-06-18 · Source: Computer Vision and Pattern Recognition · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics, Computer Vision · Depth: Expert, quick

Summary

PCFootprint introduces the first large-scale public dataset for vectorized building footprint extraction from airborne laser scanning point clouds. This dataset addresses inherent limitations of image-based methods, such as occlusions, perspective distortions, and lack of explicit elevation information. Comprising 33,000 tiles derived from the Estonian Land and Spatial Development Board, PCFootprint covers diverse urban and rural landscapes, with each tile spanning 128 x 128 m and featuring systematically aligned vectorized footprints. It includes a 3,000-tile cross-domain test set to evaluate generalization across geographic regions. Benchmarking mainstream methods on PCFootprint reveals significant challenges, including high intra-class variance, data imbalance, and noise in complex geospatial environments. The dataset is publicly available on Hugging Face.

Key takeaway

For computer vision engineers developing building footprint extraction models, traditional image-based methods face inherent limitations like occlusions and lack of elevation. You should consider integrating LiDAR point cloud data, as the new PCFootprint dataset provides a large-scale, diverse resource to train and benchmark models, improving robustness across varied urban and rural environments. Utilize this dataset to overcome current challenges and advance urban scene understanding.

Key insights

PCFootprint is the first large-scale LiDAR dataset for vectorized building footprint extraction, addressing optical imagery limitations.

Principles

LiDAR overcomes optical imagery limits.
Diverse datasets improve generalization.
Benchmarking reveals extraction challenges.

Method

The article describes creating PCFootprint from 33,000 Estonian LiDAR tiles, each 128x128m, with aligned vectorized footprints, including a 3,000-tile cross-domain test set.

In practice

Use PCFootprint for building modeling.
Evaluate methods on diverse landscapes.
Address data imbalance in LiDAR.

Topics

Building Footprint Extraction
LiDAR Point Clouds
Aerial Laser Scanning
Geospatial Analysis
Urban Scene Understanding
Dataset Benchmark

Best for: AI Scientist, Computer Vision Engineer, Research Scientist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Computer Vision and Pattern Recognition.