CMFuse: Cross-Modal Features Mixing via Convolution and MLP for Infrared and Visible Image Fusion

被引：0

作者：

Cai, Zhao ^{[1
,2
]}

Ma, Yong ^{[1
]}

Huang, Jun ^{[1
]}

Mei, Xiaoguang ^{[1
]}

Fan, Fan ^{[1
]}

Zhao, Zhiqing ^{[1
]}

机构：

[1] Wuhan Univ, Elect Informat Sch, Wuhan 430072, Peoples R China

[2] Hubei Engn Univ, Coll Phys & Elect Informat Engn, Xiaogan 432100, Peoples R China

来源：

IEEE SENSORS JOURNAL | 2024年 / 24卷 / 15期

基金：

中国国家自然科学基金;

关键词：

Image fusion; Feature extraction; Vectors; Transformers; Convolution; Image reconstruction; Lighting; Feature mixing; image fusion; long-range dependencies; multilayer perceptron (MLP); NETWORK; ARCHITECTURE; NEST;

D O I：

10.1109/JSEN.2024.3410387

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

In infrared and visible image fusion, recent proposed methods predominantly employ self-attention mechanism to explore the long-range dependencies among image features, aiming to mitigate the loss of global relationships. However, these methods primarily concentrate on capturing the dependencies within modality and pay minimal attention to cross-modal interaction, leading to unsatisfactory global contrast in the fusion results. Furthermore, weak inductive bias of the self-attention mechanism constrains its ability to capture local features, potentially leading to a loss of details and texture in the fused image. In this article, we explore a simple but effective network structure to equivalently model long-range dependencies and propose a cross-modal global feature mixing network called CMFuse. Specifically, we propose intra- and inter-modality mixing modules (Intra-Conv-MLP and Inter-Conv-MLP), which consist of residual multilayer perceptron (MLP) and depthwise separable convolutions. Our modules are designed to extract and integrate complementary information within and between modalities, leveraging the global receptive field of MLP. Moreover, multiple residual dense blocks (RDBs) are also employed to enhance ability of our network in extracting local fine-grained features, thereby enriching textures in fusion images. Extensive experiments demonstrate that CMFuse outperforms existing state-of-the-art methods. Furthermore, our model significantly enhances the performance of high-level vision tasks. Our code and pre-trained model will be published at https://github.com/zc617/Conv-MLP-Fusion.

引用

页码：24152 / 24167

页数：16

共 50 条

[21] Infrared and visible image fusion using joint convolution sparse coding
Zhang, Chengfang
Yue, Zhen
Yan, Dan
Yang, Xingchun
2019 INTERNATIONAL CONFERENCE ON IMAGE AND VIDEO PROCESSING, AND ARTIFICIAL INTELLIGENCE, 2019, 11321
[22] CGTF: Convolution-Guided Transformer for Infrared and Visible Image Fusion
Li, Jing
Zhu, Jianming
Li, Chang
Chen, Xun
Yang, Bin
IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, 2022, 71
[23] Infrared and Visible Image Fusion Based on Spatial Convolution Sparse representation
Shao, Luling
Wu, Jin
Wu, Minghui
2020 3RD INTERNATIONAL CONFERENCE ON COMPUTER INFORMATION SCIENCE AND APPLICATION TECHNOLOGY (CISAT) 2020, 2020, 1634
[24] Infrared and visible light image fusion based on convolution and self attention
Chen, Xiaoxuan
Xu, Shuwen
Hu, Shaohai
Ma, Xiaole
Xi Tong Gong Cheng Yu Dian Zi Ji Shu/Systems Engineering and Electronics, 2024, 46 (08): : 2641 - 2649
[25] 2D-Convolution Based Feature Fusion for Cross-Modal Correlation Learning
Guo, Jingjing
Yu, Jing
Lu, Yuhang
Hu, Yue
Liu, Yanbing
COMPUTATIONAL SCIENCE - ICCS 2019, PT II, 2019, 11537 : 131 - 144
[26] Cross-Modal Image Clustering via Canonical Correlation Analysis
Jin, Cheng
Mao, Wenhui
Zhang, Ruiqi
Zhang, Yuejie
Xue, Xiangyang
PROCEEDINGS OF THE TWENTY-NINTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2015, : 151 - 159
[27] Bridging Music and Image via Cross-Modal Ranking Analysis
Wu, Xixuan
Qiao, Yu
Wang, Xiaogang
Tang, Xiaoou
IEEE TRANSACTIONS ON MULTIMEDIA, 2016, 18 (07) : 1305 - 1318
[28] Cross-Modal Hybrid Feature Fusion for Image-Sentence Matching
Xu, Xing
Wang, Yifan
He, Yixuan
Yang, Yang
Hanjalic, Alan
Shen, Heng Tao
ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2021, 17 (04)
[29] Heterogeneous Feature Fusion and Cross-modal Alignment for Composed Image Retrieval
Zhang, Gangjian
Wei, Shikui
Pang, Huaxin
Zhao, Yao
PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 5353 - 5362
[30] Progressive learning in cross-modal cross-scale fusion transformer for visible-infrared video-based person reidentification
Mukhtar, Hamza
Mukhtar, Umar Raza
KNOWLEDGE-BASED SYSTEMS, 2024, 304

← 1 2 3 4 5 →