CMFuse: Cross-Modal Features Mixing via Convolution and MLP for Infrared and Visible Image Fusion

被引:0
|
作者
Cai, Zhao [1 ,2 ]
Ma, Yong [1 ]
Huang, Jun [1 ]
Mei, Xiaoguang [1 ]
Fan, Fan [1 ]
Zhao, Zhiqing [1 ]
机构
[1] Wuhan Univ, Elect Informat Sch, Wuhan 430072, Peoples R China
[2] Hubei Engn Univ, Coll Phys & Elect Informat Engn, Xiaogan 432100, Peoples R China
基金
中国国家自然科学基金;
关键词
Image fusion; Feature extraction; Vectors; Transformers; Convolution; Image reconstruction; Lighting; Feature mixing; image fusion; long-range dependencies; multilayer perceptron (MLP); NETWORK; ARCHITECTURE; NEST;
D O I
10.1109/JSEN.2024.3410387
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
In infrared and visible image fusion, recent proposed methods predominantly employ self-attention mechanism to explore the long-range dependencies among image features, aiming to mitigate the loss of global relationships. However, these methods primarily concentrate on capturing the dependencies within modality and pay minimal attention to cross-modal interaction, leading to unsatisfactory global contrast in the fusion results. Furthermore, weak inductive bias of the self-attention mechanism constrains its ability to capture local features, potentially leading to a loss of details and texture in the fused image. In this article, we explore a simple but effective network structure to equivalently model long-range dependencies and propose a cross-modal global feature mixing network called CMFuse. Specifically, we propose intra- and inter-modality mixing modules (Intra-Conv-MLP and Inter-Conv-MLP), which consist of residual multilayer perceptron (MLP) and depthwise separable convolutions. Our modules are designed to extract and integrate complementary information within and between modalities, leveraging the global receptive field of MLP. Moreover, multiple residual dense blocks (RDBs) are also employed to enhance ability of our network in extracting local fine-grained features, thereby enriching textures in fusion images. Extensive experiments demonstrate that CMFuse outperforms existing state-of-the-art methods. Furthermore, our model significantly enhances the performance of high-level vision tasks. Our code and pre-trained model will be published at https://github.com/zc617/Conv-MLP-Fusion.
引用
收藏
页码:24152 / 24167
页数:16
相关论文
共 50 条
  • [31] RGBD Salient Object Detection via Disentangled Cross-Modal Fusion
    Chen, Hao
    Deng, Yongjian
    Li, Youfu
    Hung, Tzu-Yi
    Lin, Guosheng
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2020, 29 (29) : 8407 - 8416
  • [32] Infrared and visible image fusion via gradientlet filter
    Ma, Jiayi
    Zhou, Yi
    COMPUTER VISION AND IMAGE UNDERSTANDING, 2020, 197
  • [33] Infrared and Visible Image Fusion via Decoupling Network
    Wang, Xue
    Guan, Zheng
    Yu, Shishuang
    Cao, Jinde
    Li, Ya
    IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, 2022, 71
  • [34] Infrared and visible image fusion method based on sparse features
    Ding, Wenshan
    Bi, Duyan
    He, Linyuan
    Fan, Zunlin
    INFRARED PHYSICS & TECHNOLOGY, 2018, 92 : 372 - 380
  • [35] A joint convolution auto-encoder network for infrared and visible image fusion
    Zhang, Zhancheng
    Gao, Yuanhao
    Xiong, Mengyu
    Luo, Xiaoqing
    Wu, Xiao-Jun
    MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 82 (19) : 29017 - 29035
  • [36] FECFusion: Infrared and visible image fusion network based on fast edge convolution
    Chen, Zhaoyu
    Fan, Hongbo
    Ma, Meiyan
    Shao, Dangguo
    MATHEMATICAL BIOSCIENCES AND ENGINEERING, 2023, 20 (09) : 16060 - 16082
  • [37] A joint convolution auto-encoder network for infrared and visible image fusion
    Zhancheng Zhang
    Yuanhao Gao
    Mengyu Xiong
    Xiaoqing Luo
    Xiao-Jun Wu
    Multimedia Tools and Applications, 2023, 82 : 29017 - 29035
  • [38] Multimode Fiber Image Transmission via Cross-Modal Knowledge distillation
    Lin, Weixuan
    Wu, Di
    Boulet, Benoit
    2024 IEEE CANADIAN CONFERENCE ON ELECTRICAL AND COMPUTER ENGINEERING, CCECE 2024, 2024, : 13 - 19
  • [39] A light-weight, efficient, and general cross-modal image fusion network
    Fang, Aiqing
    Zhao, Xinbo
    Yang, Jiaqi
    Qin, Beibei
    Zhang, Yanning
    NEUROCOMPUTING, 2021, 463 : 198 - 211
  • [40] Cross-Modal Collaborative Evolution Reinforced by Semantic Coupling for Image Registration and Fusion
    Xiong, Yan
    Kong, Jun
    Zhang, Yunde
    Lu, Ming
    Jiang, Min
    IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, 2025, 74