StereoFactory: A Unified Merging Framework for Robust Stereo Matching

· Source: Computer Vision and Pattern Recognition · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Computer Vision · Depth: Expert, quick

Summary

StereoFactory is a novel unified merging framework designed to enhance robust stereo matching by addressing the scalability bottleneck of foundation models. Traditional methods necessitate costly joint retraining for new data, while existing merging techniques often introduce harmful task-vector interference. StereoFactory employs a coarse-to-fine evolutionary approach, beginning with a genetic algorithm in Stage 1 to identify optimal model subsets. Stage 2 then utilizes CMA-ES optimization for architecture-adaptive routing and optional module-level scaling, specifically targeting module-level knowledge specialization. Experiments across two architectures and four benchmarks demonstrate StereoFactory's effectiveness, reducing average error from 3.80 to 3.30 on NMRF and from 2.88 to 2.19 on FoundationStereo. This post-hoc search requires only 2.7-3.7% of the wall-clock time needed for joint retraining, highlighting its efficiency. Analysis also confirms that knowledge contributions are module-specific and transferable across architectures.

Key takeaway

For Computer Vision Engineers integrating new datasets into stereo matching models, StereoFactory offers a compelling alternative to costly joint retraining. You can achieve superior accuracy, reducing average error on benchmarks like NMRF and FoundationStereo, while requiring only 2.7-3.7% of the wall-clock time. Consider exploring evolutionary model merging to adaptively integrate specialized knowledge and enhance your model's robustness and efficiency.

Key insights

StereoFactory adaptively merges specialized models using evolutionary algorithms, significantly improving stereo matching accuracy and efficiency over costly joint retraining.

Principles

Method

StereoFactory uses a coarse-to-fine evolutionary framework: Stage 1 employs a genetic algorithm for model subset selection, followed by Stage 2's CMA-ES optimization for architecture-adaptive routing and module-level scaling to integrate specialized knowledge.

In practice

Topics

Code references

Best for: Research Scientist, AI Scientist, Machine Learning Engineer, Computer Vision Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Computer Vision and Pattern Recognition.