StereoFactory: A Unified Merging Framework for Robust Stereo Matching
Summary
StereoFactory is a novel unified merging framework designed to enhance robust stereo matching by addressing the scalability bottleneck of foundation models. Traditional methods necessitate costly joint retraining for new data, while existing merging techniques often introduce harmful task-vector interference. StereoFactory employs a coarse-to-fine evolutionary approach, beginning with a genetic algorithm in Stage 1 to identify optimal model subsets. Stage 2 then utilizes CMA-ES optimization for architecture-adaptive routing and optional module-level scaling, specifically targeting module-level knowledge specialization. Experiments across two architectures and four benchmarks demonstrate StereoFactory's effectiveness, reducing average error from 3.80 to 3.30 on NMRF and from 2.88 to 2.19 on FoundationStereo. This post-hoc search requires only 2.7-3.7% of the wall-clock time needed for joint retraining, highlighting its efficiency. Analysis also confirms that knowledge contributions are module-specific and transferable across architectures.
Key takeaway
For Computer Vision Engineers integrating new datasets into stereo matching models, StereoFactory offers a compelling alternative to costly joint retraining. You can achieve superior accuracy, reducing average error on benchmarks like NMRF and FoundationStereo, while requiring only 2.7-3.7% of the wall-clock time. Consider exploring evolutionary model merging to adaptively integrate specialized knowledge and enhance your model's robustness and efficiency.
Key insights
StereoFactory adaptively merges specialized models using evolutionary algorithms, significantly improving stereo matching accuracy and efficiency over costly joint retraining.
Principles
- Knowledge contributions are module-specific.
- Merged subsets transfer across architectures.
- Adaptive merging reduces task-vector interference.
Method
StereoFactory uses a coarse-to-fine evolutionary framework: Stage 1 employs a genetic algorithm for model subset selection, followed by Stage 2's CMA-ES optimization for architecture-adaptive routing and module-level scaling to integrate specialized knowledge.
In practice
- Reduce retraining costs for new data.
- Improve stereo matching error rates.
- Integrate module-specific knowledge.
Topics
- Stereo Matching
- Model Merging
- Evolutionary Algorithms
- Computer Vision
- Knowledge Transfer
- Machine Learning Efficiency
Code references
Best for: Research Scientist, AI Scientist, Machine Learning Engineer, Computer Vision Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Computer Vision and Pattern Recognition.