CMFuse: Cross-Modal Features Mixing via Convolution and MLP for Infrared and Visible Image Fusion

被引:0
|
作者
Cai, Zhao [1 ,2 ]
Ma, Yong [1 ]
Huang, Jun [1 ]
Mei, Xiaoguang [1 ]
Fan, Fan [1 ]
Zhao, Zhiqing [1 ]
机构
[1] Wuhan Univ, Elect Informat Sch, Wuhan 430072, Peoples R China
[2] Hubei Engn Univ, Coll Phys & Elect Informat Engn, Xiaogan 432100, Peoples R China
基金
中国国家自然科学基金;
关键词
Image fusion; Feature extraction; Vectors; Transformers; Convolution; Image reconstruction; Lighting; Feature mixing; image fusion; long-range dependencies; multilayer perceptron (MLP); NETWORK; ARCHITECTURE; NEST;
D O I
10.1109/JSEN.2024.3410387
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
In infrared and visible image fusion, recent proposed methods predominantly employ self-attention mechanism to explore the long-range dependencies among image features, aiming to mitigate the loss of global relationships. However, these methods primarily concentrate on capturing the dependencies within modality and pay minimal attention to cross-modal interaction, leading to unsatisfactory global contrast in the fusion results. Furthermore, weak inductive bias of the self-attention mechanism constrains its ability to capture local features, potentially leading to a loss of details and texture in the fused image. In this article, we explore a simple but effective network structure to equivalently model long-range dependencies and propose a cross-modal global feature mixing network called CMFuse. Specifically, we propose intra- and inter-modality mixing modules (Intra-Conv-MLP and Inter-Conv-MLP), which consist of residual multilayer perceptron (MLP) and depthwise separable convolutions. Our modules are designed to extract and integrate complementary information within and between modalities, leveraging the global receptive field of MLP. Moreover, multiple residual dense blocks (RDBs) are also employed to enhance ability of our network in extracting local fine-grained features, thereby enriching textures in fusion images. Extensive experiments demonstrate that CMFuse outperforms existing state-of-the-art methods. Furthermore, our model significantly enhances the performance of high-level vision tasks. Our code and pre-trained model will be published at https://github.com/zc617/Conv-MLP-Fusion.
引用
收藏
页码:24152 / 24167
页数:16
相关论文
共 50 条
  • [41] Semantic-Enhanced Cross-Modal Fusion for Improved Unsupervised Image Captioning
    Xiang, Nan
    Chen, Ling
    Liang, Leiyan
    Rao, Xingdi
    Gong, Zehao
    ELECTRONICS, 2023, 12 (17)
  • [42] Cross-modal fusion for multi-label image classification with attention mechanism
    Wang, Yangtao
    Xie, Yanzhao
    Zeng, Jiangfeng
    Wang, Hanpin
    Fan, Lisheng
    Song, Yufan
    Computers and Electrical Engineering, 2022, 101
  • [43] Cross-modal fusion for multi-label image classification with attention mechanism
    Wang, Yangtao
    Xie, Yanzhao
    Zeng, Jiangfeng
    Wang, Hanpin
    Fan, Lisheng
    Song, Yufan
    COMPUTERS & ELECTRICAL ENGINEERING, 2022, 101
  • [44] DCMFNet: Deep Cross-Modal Fusion Network for Referring Image Segmentation with Iterative Gated Fusion
    Huang, Zhen
    Xue, Mingcheng
    Liu, Yu
    Xu, Kaiping
    Li, Jiangquan
    Yu, Chenyang
    PROCEEDINGS OF THE 50TH GRAPHICS INTERFACE CONFERENCE, GI 2024, 2024,
  • [45] Unified Adversarial Patch for Visible-Infrared Cross-Modal Attacks in the Physical World
    Wei, Xingxing
    Huang, Yao
    Sun, Yitong
    Yu, Jie
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2024, 46 (04) : 2348 - 2363
  • [46] Mixed-scale cross-modal fusion network for referring image segmentation
    Pan, Xiong
    Xie, Xuemei
    Yang, Jianxiu
    NEUROCOMPUTING, 2025, 614
  • [47] Heterogeneous Graph Fusion Network for cross-modal image-text retrieval
    Qin, Xueyang
    Li, Lishuang
    Pang, Guangyao
    Hao, Fei
    EXPERT SYSTEMS WITH APPLICATIONS, 2024, 249
  • [48] Cascaded Cross-modal Alignment for Visible-Infrared Person Re-Identification
    Li, Zhaohui
    Wang, Qiangchang
    Chen, Lu
    Zhang, Xinxin
    Yin, Yilong
    KNOWLEDGE-BASED SYSTEMS, 2024, 305
  • [49] Infrared-Visible Cross-Modal Person Re-Identification with an X Modality
    Li, Diangang
    Wei, Xing
    Hong, Xiaopeng
    Gong, Yihong
    THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 4610 - 4617
  • [50] Infrared-visible cross-modal person re-identification via dual-attention collaborative learning*
    Li, Yunshang
    Chen, Ying
    SIGNAL PROCESSING-IMAGE COMMUNICATION, 2022, 109