CMFuse: Cross-Modal Features Mixing via Convolution and MLP for Infrared and Visible Image Fusion

被引:0
|
作者
Cai, Zhao [1 ,2 ]
Ma, Yong [1 ]
Huang, Jun [1 ]
Mei, Xiaoguang [1 ]
Fan, Fan [1 ]
Zhao, Zhiqing [1 ]
机构
[1] Wuhan Univ, Elect Informat Sch, Wuhan 430072, Peoples R China
[2] Hubei Engn Univ, Coll Phys & Elect Informat Engn, Xiaogan 432100, Peoples R China
基金
中国国家自然科学基金;
关键词
Image fusion; Feature extraction; Vectors; Transformers; Convolution; Image reconstruction; Lighting; Feature mixing; image fusion; long-range dependencies; multilayer perceptron (MLP); NETWORK; ARCHITECTURE; NEST;
D O I
10.1109/JSEN.2024.3410387
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
In infrared and visible image fusion, recent proposed methods predominantly employ self-attention mechanism to explore the long-range dependencies among image features, aiming to mitigate the loss of global relationships. However, these methods primarily concentrate on capturing the dependencies within modality and pay minimal attention to cross-modal interaction, leading to unsatisfactory global contrast in the fusion results. Furthermore, weak inductive bias of the self-attention mechanism constrains its ability to capture local features, potentially leading to a loss of details and texture in the fused image. In this article, we explore a simple but effective network structure to equivalently model long-range dependencies and propose a cross-modal global feature mixing network called CMFuse. Specifically, we propose intra- and inter-modality mixing modules (Intra-Conv-MLP and Inter-Conv-MLP), which consist of residual multilayer perceptron (MLP) and depthwise separable convolutions. Our modules are designed to extract and integrate complementary information within and between modalities, leveraging the global receptive field of MLP. Moreover, multiple residual dense blocks (RDBs) are also employed to enhance ability of our network in extracting local fine-grained features, thereby enriching textures in fusion images. Extensive experiments demonstrate that CMFuse outperforms existing state-of-the-art methods. Furthermore, our model significantly enhances the performance of high-level vision tasks. Our code and pre-trained model will be published at https://github.com/zc617/Conv-MLP-Fusion.
引用
收藏
页码:24152 / 24167
页数:16
相关论文
共 50 条
  • [1] Cross-Modal Transformers for Infrared and Visible Image Fusion
    Park, Seonghyun
    Vien, An Gia
    Lee, Chul
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (02) : 770 - 785
  • [2] Infrared and visible image fusion based on cross-modal extraction strategy
    Liu, Xiaowen
    Li, Jing
    Yang, Xin
    Huo, Hongtao
    INFRARED PHYSICS & TECHNOLOGY, 2022, 124
  • [3] Infrared and Visible Cross-Modal Image Retrieval Through Shared Features
    Liu, Fangcen
    Gao, Chenqiang
    Sun, Yongqing
    Zhao, Yue
    Yang, Feng
    Qin, Anyong
    Meng, Deyu
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2021, 31 (11) : 4485 - 4496
  • [4] CCAFusion: Cross-Modal Coordinate Attention Network for Infrared and Visible Image Fusion
    Li, Xiaoling
    Li, Yanfeng
    Chen, Houjin
    Peng, Yahui
    Pan, Pan
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (02) : 866 - 881
  • [5] Efficient multi-level cross-modal fusion and detection network for infrared and visible image
    Gao, Hongwei
    Wang, Yutong
    Sun, Jian
    Jiang, Yueqiu
    Gai, Yonggang
    Yu, Jiahui
    ALEXANDRIA ENGINEERING JOURNAL, 2024, 108 : 306 - 318
  • [6] TCTFusion: A Triple-Branch Cross-Modal Transformer for Adaptive Infrared and Visible Image Fusion
    Zhang, Liang
    Jiang, Yueqiu
    Yang, Wei
    Liu, Bo
    ELECTRONICS, 2025, 14 (04):
  • [7] CMFA_Net: A cross-modal feature aggregation network for infrared-visible image fusion
    Ding, Zhaisheng
    Li, Haiyan
    Zhou, Dongming
    Li, Hongsong
    Liu, Yanyu
    Hou, Ruichao
    INFRARED PHYSICS & TECHNOLOGY, 2021, 118
  • [8] BCMFIFuse: A Bilateral Cross-Modal Feature Interaction-Based Network for Infrared and Visible Image Fusion
    Gao, Xueyan
    Liu, Shiguang
    REMOTE SENSING, 2024, 16 (17)
  • [9] Fast Graph Convolution Network Based Multi-label Image Recognition via Cross-modal Fusion
    Wang, Yangtao
    Xie, Yanzhao
    Liu, Yu
    Zhou, Ke
    Li, Xiaocui
    CIKM '20: PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT, 2020, : 1575 - 1584
  • [10] MCFusion: infrared and visible image fusion based multiscale receptive field and cross-modal enhanced attention mechanism
    Jiang, Min
    Wang, Zhiyuan
    Kong, Jun
    Zhuang, Danfeng
    JOURNAL OF ELECTRONIC IMAGING, 2024, 33 (01)