Co-PLNet: A Collaborative Point-Line Network for Prompt-Guided Wireframe Parsing
Summary
Co-PLNet is a novel point-line collaborative framework for wireframe parsing that addresses mismatches between independently predicted line segments and junctions. This network, developed by Chao Wang et al., exchanges spatial cues between these tasks using a Point–Line Prompt Encoder (PLP-Encoder) and a Cross-Guidance Line Decoder (CGL-Decoder). The PLP-Encoder converts early detections into spatial prompts, encoding geometric attributes into aligned maps. The CGL-Decoder refines predictions with sparse attention conditioned on these prompts, enforcing point–line consistency. Evaluated on the Wireframe (5,000 training, 462 test images) and YorkUrban (102 test images) datasets, Co-PLNet consistently improves accuracy and robustness, achieving real-time efficiency at 76.8 FPS on an NVIDIA RTX 4080 GPU. The model uses 512x512 images, 16-channel prompts, and 32-channel attention with 4 heads and an 8-pixel window.
Key takeaway
For computer vision engineers developing SLAM or scene parsing systems, Co-PLNet offers a robust solution for wireframe parsing. Its collaborative point-line approach significantly reduces endpoint mismatch rates, improving geometric consistency crucial for downstream tasks. You should consider integrating this prompt-guided framework to enhance both accuracy and real-time performance in your applications, especially where precise structural geometry is critical.
Key insights
Co-PLNet improves wireframe parsing by collaboratively exchanging spatial prompts between line and junction detection tasks for geometric consistency.
Principles
- Bidirectional interaction enhances geometric coherence.
- Sparse attention balances accuracy and real-time performance.
- Prompt-based encoding fuses line and junction cues.
Method
Co-PLNet uses a PLP-Encoder to convert initial line/junction predictions into spatial prompts, which then guide a CGL-Decoder via sparse multi-head cross-attention for refined, consistent line segment predictions.
In practice
- Use an 8-pixel window size for sparse attention.
- Employ Adam optimizer with a 4e-4 learning rate for 35 epochs.
Topics
- Wireframe Parsing
- Computer Vision
- SLAM
- Point-Line Networks
- Spatial Prompts
- Sparse Attention
Code references
Best for: Research Scientist, AI Scientist, Machine Learning Engineer, Computer Vision Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by stat.ML updates on arXiv.org.