Co-PLNet: A Collaborative Point-Line Network for Prompt-Guided Wireframe Parsing

2026-06-17 · Source: stat.ML updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Robotics & Autonomous Systems · Depth: Expert, long

Summary

Co-PLNet is a novel point-line collaborative framework for wireframe parsing that addresses mismatches between independently predicted line segments and junctions. This network, developed by Chao Wang et al., exchanges spatial cues between these tasks using a Point–Line Prompt Encoder (PLP-Encoder) and a Cross-Guidance Line Decoder (CGL-Decoder). The PLP-Encoder converts early detections into spatial prompts, encoding geometric attributes into aligned maps. The CGL-Decoder refines predictions with sparse attention conditioned on these prompts, enforcing point–line consistency. Evaluated on the Wireframe (5,000 training, 462 test images) and YorkUrban (102 test images) datasets, Co-PLNet consistently improves accuracy and robustness, achieving real-time efficiency at 76.8 FPS on an NVIDIA RTX 4080 GPU. The model uses 512x512 images, 16-channel prompts, and 32-channel attention with 4 heads and an 8-pixel window.

Key takeaway

For computer vision engineers developing SLAM or scene parsing systems, Co-PLNet offers a robust solution for wireframe parsing. Its collaborative point-line approach significantly reduces endpoint mismatch rates, improving geometric consistency crucial for downstream tasks. You should consider integrating this prompt-guided framework to enhance both accuracy and real-time performance in your applications, especially where precise structural geometry is critical.

Key insights

Co-PLNet improves wireframe parsing by collaboratively exchanging spatial prompts between line and junction detection tasks for geometric consistency.

Principles

Bidirectional interaction enhances geometric coherence.
Sparse attention balances accuracy and real-time performance.
Prompt-based encoding fuses line and junction cues.

Method

Co-PLNet uses a PLP-Encoder to convert initial line/junction predictions into spatial prompts, which then guide a CGL-Decoder via sparse multi-head cross-attention for refined, consistent line segment predictions.

In practice

Use an 8-pixel window size for sparse attention.
Employ Adam optimizer with a 4e-4 learning rate for 35 epochs.

Topics

Wireframe Parsing
Computer Vision
SLAM
Point-Line Networks
Spatial Prompts
Sparse Attention

Code references

GalacticHogrider/Co-PLNet

Best for: Research Scientist, AI Scientist, Machine Learning Engineer, Computer Vision Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by stat.ML updates on arXiv.org.