Can These Views Be One Scene? Evaluating Multiview 3D Consistency when 3D Foundation Models Hallucinate
Summary
A new study evaluates the reliability of multiview 3D consistency metrics, particularly when 3D foundation models generate artifacts or inconsistent scenes. Traditional evaluation methods often assume a single static 3D scene, an assumption that frequently fails in neural radiance fields (NVS) and sparse-view reconstruction due to noise, repeated views, or outlier frames. The researchers introduce "enchmark", a controlled robustness benchmark, and a parametric framework that decomposes neural metrics like MEt3R into backbone, residual, and aggregation components, yielding variants up to 3x more robust. Their analysis reveals that metrics such as VGGT, MASt3R, DUSt3R, and Fast3R can hallucinate dense geometry and cross-view support for unrelated scenes. To address this, the study proposes COLMAP-based metrics utilizing matches, registration, dense support, and reconstruction failure signals, which achieve up to 4x higher correlation with human judgments on real NVS outputs compared to MEt3R.
Key takeaway
For research scientists developing or evaluating 3D foundation models, you should critically assess existing multiview 3D consistency metrics, as many are prone to hallucination. Prioritize integrating COLMAP-based metrics, which leverage geometric verification and reconstruction failure signals, into your evaluation pipelines. This approach will provide more reliable assessments of 3D consistency and better correlate with human perception, ultimately leading to more robust model development.
Key insights
Multiview 3D consistency metrics can fail when 3D foundation models hallucinate, requiring more robust evaluation.
Principles
- Neural metrics can be decomposed for robustness analysis.
- Classical geometric verification enhances reliability.
Method
The study introduces enchmark, a parametric family for neural metrics, and COLMAP-based metrics that use matches, registration, dense support, and reconstruction failure as consistency signals.
In practice
- Use COLMAP-based metrics for NVS evaluation.
- Decompose neural metrics for improved robustness.
Topics
- Multiview 3D Consistency
- 3D Foundation Models
- Neural Reconstruction Priors
- Geometric Verification
- COLMAP-based Metrics
Best for: Research Scientist, AI Scientist, Computer Vision Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Computer Vision and Pattern Recognition.