Liquid Fusion of Heterogeneous Representations Towards General Salient Object Detection

· Source: Computer Vision and Pattern Recognition · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Computer Vision · Depth: Expert, quick

Summary

The Liquid Fusion Network (LFNet) addresses a critical limitation in General Salient Object Detection (SOD) by harmonizing heterogeneous representations from State Space Models (SSMs) and Convolutional Neural Networks (CNNs). Current SOD methods neglect the inherent spectral biases of these network paradigms, which exhibit complementary frequency preferences. Inspired by Liquid Neural Networks (LNNs), LFNet introduces a liquid fusion mechanism to dynamically integrate features from VMamba (SSM) and ConvNeXt (CNN) backbones. It treats VMamba features as evolving states and ConvNeXt features as exogenous stimuli, employing a dynamic gating for content-aware aggregation. This state-stimulus paradigm enhances flexibility for multi-modal cues. Additionally, LFNet incorporates a Saliency-Guided Upsampling (SGU) operator, utilizing a spectral-spatial co-design to mitigate upsampling artifacts while preserving semantic information. Extensive experiments across five tasks—RGB, RGB-D, RGB-T, VSOD, and VDT—demonstrate LFNet's state-of-the-art performance, offering a superior balance between detection accuracy and model efficiency.

Key takeaway

For Computer Vision Engineers developing advanced Salient Object Detection (SOD) systems, LFNet presents a significant advancement. If your projects involve multi-modal data or require superior accuracy and efficiency, you should evaluate LFNet's dynamic fusion of SSM and CNN features. This approach effectively addresses spectral biases, offering state-of-the-art performance across diverse tasks like RGB-D and VSOD. Consider exploring the released code to integrate its liquid fusion and Saliency-Guided Upsampling techniques into your next-generation models.

Key insights

Harmonizing complementary spectral biases of CNNs and SSMs via dynamic liquid fusion improves salient object detection.

Principles

Method

LFNet dynamically integrates VMamba (state) and ConvNeXt (stimulus) features using a gating mechanism, then employs Saliency-Guided Upsampling for artifact-free propagation.

In practice

Topics

Code references

Best for: Research Scientist, AI Scientist, Machine Learning Engineer, Computer Vision Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Computer Vision and Pattern Recognition.