CMFuse: Cross-Modal Features Mixing via Convolution and MLP for Infrared and Visible Image Fusion

被引:0
|
作者
Cai, Zhao [1 ,2 ]
Ma, Yong [1 ]
Huang, Jun [1 ]
Mei, Xiaoguang [1 ]
Fan, Fan [1 ]
Zhao, Zhiqing [1 ]
机构
[1] Wuhan Univ, Elect Informat Sch, Wuhan 430072, Peoples R China
[2] Hubei Engn Univ, Coll Phys & Elect Informat Engn, Xiaogan 432100, Peoples R China
基金
中国国家自然科学基金;
关键词
Image fusion; Feature extraction; Vectors; Transformers; Convolution; Image reconstruction; Lighting; Feature mixing; image fusion; long-range dependencies; multilayer perceptron (MLP); NETWORK; ARCHITECTURE; NEST;
D O I
10.1109/JSEN.2024.3410387
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
In infrared and visible image fusion, recent proposed methods predominantly employ self-attention mechanism to explore the long-range dependencies among image features, aiming to mitigate the loss of global relationships. However, these methods primarily concentrate on capturing the dependencies within modality and pay minimal attention to cross-modal interaction, leading to unsatisfactory global contrast in the fusion results. Furthermore, weak inductive bias of the self-attention mechanism constrains its ability to capture local features, potentially leading to a loss of details and texture in the fused image. In this article, we explore a simple but effective network structure to equivalently model long-range dependencies and propose a cross-modal global feature mixing network called CMFuse. Specifically, we propose intra- and inter-modality mixing modules (Intra-Conv-MLP and Inter-Conv-MLP), which consist of residual multilayer perceptron (MLP) and depthwise separable convolutions. Our modules are designed to extract and integrate complementary information within and between modalities, leveraging the global receptive field of MLP. Moreover, multiple residual dense blocks (RDBs) are also employed to enhance ability of our network in extracting local fine-grained features, thereby enriching textures in fusion images. Extensive experiments demonstrate that CMFuse outperforms existing state-of-the-art methods. Furthermore, our model significantly enhances the performance of high-level vision tasks. Our code and pre-trained model will be published at https://github.com/zc617/Conv-MLP-Fusion.
引用
收藏
页码:24152 / 24167
页数:16
相关论文
共 50 条
  • [21] Infrared and visible image fusion using joint convolution sparse coding
    Zhang, Chengfang
    Yue, Zhen
    Yan, Dan
    Yang, Xingchun
    2019 INTERNATIONAL CONFERENCE ON IMAGE AND VIDEO PROCESSING, AND ARTIFICIAL INTELLIGENCE, 2019, 11321
  • [22] CGTF: Convolution-Guided Transformer for Infrared and Visible Image Fusion
    Li, Jing
    Zhu, Jianming
    Li, Chang
    Chen, Xun
    Yang, Bin
    IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, 2022, 71
  • [23] Infrared and Visible Image Fusion Based on Spatial Convolution Sparse representation
    Shao, Luling
    Wu, Jin
    Wu, Minghui
    2020 3RD INTERNATIONAL CONFERENCE ON COMPUTER INFORMATION SCIENCE AND APPLICATION TECHNOLOGY (CISAT) 2020, 2020, 1634
  • [24] Infrared and visible light image fusion based on convolution and self attention
    Chen, Xiaoxuan
    Xu, Shuwen
    Hu, Shaohai
    Ma, Xiaole
    Xi Tong Gong Cheng Yu Dian Zi Ji Shu/Systems Engineering and Electronics, 2024, 46 (08): : 2641 - 2649
  • [25] 2D-Convolution Based Feature Fusion for Cross-Modal Correlation Learning
    Guo, Jingjing
    Yu, Jing
    Lu, Yuhang
    Hu, Yue
    Liu, Yanbing
    COMPUTATIONAL SCIENCE - ICCS 2019, PT II, 2019, 11537 : 131 - 144
  • [26] Cross-Modal Image Clustering via Canonical Correlation Analysis
    Jin, Cheng
    Mao, Wenhui
    Zhang, Ruiqi
    Zhang, Yuejie
    Xue, Xiangyang
    PROCEEDINGS OF THE TWENTY-NINTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2015, : 151 - 159
  • [27] Bridging Music and Image via Cross-Modal Ranking Analysis
    Wu, Xixuan
    Qiao, Yu
    Wang, Xiaogang
    Tang, Xiaoou
    IEEE TRANSACTIONS ON MULTIMEDIA, 2016, 18 (07) : 1305 - 1318
  • [28] Cross-Modal Hybrid Feature Fusion for Image-Sentence Matching
    Xu, Xing
    Wang, Yifan
    He, Yixuan
    Yang, Yang
    Hanjalic, Alan
    Shen, Heng Tao
    ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2021, 17 (04)
  • [29] Heterogeneous Feature Fusion and Cross-modal Alignment for Composed Image Retrieval
    Zhang, Gangjian
    Wei, Shikui
    Pang, Huaxin
    Zhao, Yao
    PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 5353 - 5362
  • [30] Progressive learning in cross-modal cross-scale fusion transformer for visible-infrared video-based person reidentification
    Mukhtar, Hamza
    Mukhtar, Umar Raza
    KNOWLEDGE-BASED SYSTEMS, 2024, 304