Honey, I Shrunk the Arc de Triomphe!
Summary
A new dataset, MetricScenes, has been developed to address the "scale-collapse" phenomenon in metric scale monocular geometry estimation, where foundation models underestimate the size of distant landmarks and vast landscapes. Researchers hypothesize this issue stems from training data limitations, such as hardware-constrained LiDAR, short-range indoor scans, or synthetic data lacking real-world complexity. MetricScenes is curated from diverse sources like Internet photo collections and stereo imagery, with camera poses and initial depth maps estimated using off-the-shelf methods. Absolute scale is recovered from geo-tagged metadata and known stereo camera baselines. The dataset's depth map quality is further enhanced by a two-stage Poisson completion method. Fine-tuning the MoGe-2 model on MetricScenes significantly mitigates scale-collapse, achieving superior metric accuracy in unconstrained, open-domain scenes while preserving strong performance on standard benchmarks.
Key takeaway
For computer vision engineers developing monocular depth estimation systems, this research indicates that your models' "scale-collapse" issues in open-domain scenes can be significantly reduced. You should consider curating and fine-tuning on more diverse, metrically-grounded datasets like MetricScenes, leveraging geo-tagged metadata for absolute scale. This approach improves accuracy for distant objects and vast landscapes, enhancing real-world application performance without sacrificing benchmark results.
Key insights
A new dataset and method mitigate scale-collapse in monocular depth estimation for distant, unconstrained scenes.
Principles
- Training data diversity is crucial for robust metric scale estimation.
- Geo-tagged metadata can provide absolute scale anchors.
- Combining diverse data sources improves model generalization.
Method
Curate diverse internet photo/stereo data, estimate initial depth/poses, recover absolute scale from geo-tags/baselines, then refine depth maps via two-stage Poisson completion.
In practice
- Utilize geo-tagged metadata for absolute scale recovery.
- Employ Poisson completion for depth map refinement.
- Fine-tune existing models on diverse, metrically-grounded datasets.
Topics
- Monocular Depth Estimation
- Metric Scale Geometry
- Dataset Curation
- Scale Collapse Mitigation
- Geo-tagged Metadata
- Poisson Completion
Code references
Best for: Research Scientist, AI Scientist, Computer Vision Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Takara TLDR - Daily AI Papers.