Zero-Shot Polygon Matching with Pre-trained Models for Pose Estimation and Polygon Cloud from Challenging Stereo
Summary
U(PM)2, a novel unsupervised polygon matching framework for stereo images, addresses challenges like disparity discontinuity, scale variation, and generalization without requiring training. It employs a multi-stage pipeline: Segment Anything Model (SAM) for masks, vectorization to polygons, a global matcher with bidirectional-pyramid strategy and LoFTR for viewpoint/scale changes, and a local matcher with local-joint geometry and multi-feature matching (LoJoGM) using the Hungarian algorithm for local discontinuities. Benchmarked on ScanNet and SceneFlow, U(PM)2 achieved leading accuracy (87.50% on SceneFlow) and competitive speed, outperforming MESA, SGAM, and MASA by 28.29% in Matching Precision (MP) when combined with SuperPoint and LightGlue, without any training requirement. It also handles large-format imagery effectively.
Key takeaway
For computer vision engineers developing robust stereo matching solutions, U(PM)2 offers a training-free, accurate method for polygon matching, crucial for urban reconstruction or detailed 3D modeling. You can integrate its modular components, such as SAM and LoFTR, to overcome scale variations and local discontinuities, achieving top-tier accuracy at a competitive speed without extensive training data.
Key insights
Unsupervised polygon matching for stereo images is achievable by integrating pre-trained models with handcrafted features.
Principles
- Polygon matching extends image matching to higher semantic levels.
- Integrating learned and handcrafted features improves robustness.
- Bidirectional pyramids optimize search efficiency and accuracy.
Method
U(PM)2 detects polygons/points, globally matches with a bidirectional-pyramid strategy and LoFTR, then locally refines using LoJoGM with Hungarian algorithm for geometric and texture correlations.
In practice
- Use SAM for zero-shot instance segmentation and mask vectorization.
- Apply bidirectional-pyramid matching for efficient large-format image processing.
Topics
- Polygon Matching
- Stereo Image Matching
- Unsupervised Learning
- Pre-trained Models
- LoFTR
- Segment Anything Model
Best for: Research Scientist, AI Scientist, Machine Learning Engineer, Computer Vision Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CV updates on arXiv.org.