Zero-Shot Polygon Matching with Pre-trained Models for Pose Estimation and Polygon Cloud from Challenging Stereo

2026-06-08 · Source: cs.CV updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Robotics & Autonomous Systems · Depth: Expert, extended

Summary

U(PM)2, a novel unsupervised polygon matching framework for stereo images, addresses challenges like disparity discontinuity, scale variation, and generalization without requiring training. It employs a multi-stage pipeline: Segment Anything Model (SAM) for masks, vectorization to polygons, a global matcher with bidirectional-pyramid strategy and LoFTR for viewpoint/scale changes, and a local matcher with local-joint geometry and multi-feature matching (LoJoGM) using the Hungarian algorithm for local discontinuities. Benchmarked on ScanNet and SceneFlow, U(PM)2 achieved leading accuracy (87.50% on SceneFlow) and competitive speed, outperforming MESA, SGAM, and MASA by 28.29% in Matching Precision (MP) when combined with SuperPoint and LightGlue, without any training requirement. It also handles large-format imagery effectively.

Key takeaway

For computer vision engineers developing robust stereo matching solutions, U(PM)2 offers a training-free, accurate method for polygon matching, crucial for urban reconstruction or detailed 3D modeling. You can integrate its modular components, such as SAM and LoFTR, to overcome scale variations and local discontinuities, achieving top-tier accuracy at a competitive speed without extensive training data.

Key insights

Unsupervised polygon matching for stereo images is achievable by integrating pre-trained models with handcrafted features.

Principles

Polygon matching extends image matching to higher semantic levels.
Integrating learned and handcrafted features improves robustness.
Bidirectional pyramids optimize search efficiency and accuracy.

Method

U(PM)2 detects polygons/points, globally matches with a bidirectional-pyramid strategy and LoFTR, then locally refines using LoJoGM with Hungarian algorithm for geometric and texture correlations.

In practice

Use SAM for zero-shot instance segmentation and mask vectorization.
Apply bidirectional-pyramid matching for efficient large-format image processing.

Topics

Polygon Matching
Stereo Image Matching
Unsupervised Learning
Pre-trained Models
LoFTR
Segment Anything Model

Best for: Research Scientist, AI Scientist, Machine Learning Engineer, Computer Vision Engineer

Related on AIssential

See Counsel's argued verdicts on the open AI decisions leaders are weighing →

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CV updates on arXiv.org.