CMFuse: Cross-Modal Features Mixing via Convolution and MLP for Infrared and Visible Image Fusion

被引：0

作者：

Cai, Zhao ^{[1
,2
]}

Ma, Yong ^{[1
]}

Huang, Jun ^{[1
]}

Mei, Xiaoguang ^{[1
]}

Fan, Fan ^{[1
]}

Zhao, Zhiqing ^{[1
]}

机构：

[1] Wuhan Univ, Elect Informat Sch, Wuhan 430072, Peoples R China

[2] Hubei Engn Univ, Coll Phys & Elect Informat Engn, Xiaogan 432100, Peoples R China

来源：

IEEE SENSORS JOURNAL | 2024年 / 24卷 / 15期

基金：

中国国家自然科学基金;

关键词：

Image fusion; Feature extraction; Vectors; Transformers; Convolution; Image reconstruction; Lighting; Feature mixing; image fusion; long-range dependencies; multilayer perceptron (MLP); NETWORK; ARCHITECTURE; NEST;

D O I：

10.1109/JSEN.2024.3410387

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

In infrared and visible image fusion, recent proposed methods predominantly employ self-attention mechanism to explore the long-range dependencies among image features, aiming to mitigate the loss of global relationships. However, these methods primarily concentrate on capturing the dependencies within modality and pay minimal attention to cross-modal interaction, leading to unsatisfactory global contrast in the fusion results. Furthermore, weak inductive bias of the self-attention mechanism constrains its ability to capture local features, potentially leading to a loss of details and texture in the fused image. In this article, we explore a simple but effective network structure to equivalently model long-range dependencies and propose a cross-modal global feature mixing network called CMFuse. Specifically, we propose intra- and inter-modality mixing modules (Intra-Conv-MLP and Inter-Conv-MLP), which consist of residual multilayer perceptron (MLP) and depthwise separable convolutions. Our modules are designed to extract and integrate complementary information within and between modalities, leveraging the global receptive field of MLP. Moreover, multiple residual dense blocks (RDBs) are also employed to enhance ability of our network in extracting local fine-grained features, thereby enriching textures in fusion images. Extensive experiments demonstrate that CMFuse outperforms existing state-of-the-art methods. Furthermore, our model significantly enhances the performance of high-level vision tasks. Our code and pre-trained model will be published at https://github.com/zc617/Conv-MLP-Fusion.

引用

页码：24152 / 24167

页数：16

共 50 条

[41] Semantic-Enhanced Cross-Modal Fusion for Improved Unsupervised Image Captioning
Xiang, Nan
Chen, Ling
Liang, Leiyan
Rao, Xingdi
Gong, Zehao
ELECTRONICS, 2023, 12 (17)
[42] Cross-modal fusion for multi-label image classification with attention mechanism
Wang, Yangtao
Xie, Yanzhao
Zeng, Jiangfeng
Wang, Hanpin
Fan, Lisheng
Song, Yufan
Computers and Electrical Engineering, 2022, 101
[43] Cross-modal fusion for multi-label image classification with attention mechanism
Wang, Yangtao
Xie, Yanzhao
Zeng, Jiangfeng
Wang, Hanpin
Fan, Lisheng
Song, Yufan
COMPUTERS & ELECTRICAL ENGINEERING, 2022, 101
[44] DCMFNet: Deep Cross-Modal Fusion Network for Referring Image Segmentation with Iterative Gated Fusion
Huang, Zhen
Xue, Mingcheng
Liu, Yu
Xu, Kaiping
Li, Jiangquan
Yu, Chenyang
PROCEEDINGS OF THE 50TH GRAPHICS INTERFACE CONFERENCE, GI 2024, 2024,
[45] Unified Adversarial Patch for Visible-Infrared Cross-Modal Attacks in the Physical World
Wei, Xingxing
Huang, Yao
Sun, Yitong
Yu, Jie
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2024, 46 (04) : 2348 - 2363
[46] Mixed-scale cross-modal fusion network for referring image segmentation
Pan, Xiong
Xie, Xuemei
Yang, Jianxiu
NEUROCOMPUTING, 2025, 614
[47] Heterogeneous Graph Fusion Network for cross-modal image-text retrieval
Qin, Xueyang
Li, Lishuang
Pang, Guangyao
Hao, Fei
EXPERT SYSTEMS WITH APPLICATIONS, 2024, 249
[48] Cascaded Cross-modal Alignment for Visible-Infrared Person Re-Identification
Li, Zhaohui
Wang, Qiangchang
Chen, Lu
Zhang, Xinxin
Yin, Yilong
KNOWLEDGE-BASED SYSTEMS, 2024, 305
[49] Infrared-Visible Cross-Modal Person Re-Identification with an X Modality
Li, Diangang
Wei, Xing
Hong, Xiaopeng
Gong, Yihong
THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 4610 - 4617
[50] Infrared-visible cross-modal person re-identification via dual-attention collaborative learning*
Li, Yunshang
Chen, Ying
SIGNAL PROCESSING-IMAGE COMMUNICATION, 2022, 109

← 1 2 3 4 5 →