SMAE-Fusion: Integrating saliency-aware masked autoencoder with hybrid attention transformer for infrared-visible image fusion

被引:0
|
作者
Wang, Qinghua [1 ]
Li, Ziwei [1 ,2 ,4 ]
Zhang, Shuqi [1 ]
Luo, Yuhong [1 ]
Chen, Wentao [1 ]
Wang, Tianyun [1 ]
Chi, Nan [1 ,2 ,3 ]
Dai, Qionghai [1 ,5 ]
机构
[1] Fudan Univ, Sch Informat Sci & Technol, Shanghai 200433, Peoples R China
[2] Fudan Univ, Shanghai ERC LEO Satellite Commun & Applicat, Shanghai CIC LEO Satellite Commun Technol, Shanghai 200433, Peoples R China
[3] Shanghai Collaborat Innovat Ctr Low Earth Orbit Sa, Shanghai 200433, Peoples R China
[4] Shanghai Artificial Intelligence Lab, Shanghai 200232, Peoples R China
[5] Tsinghua Univ, Dept Automat, Beijing 100084, Peoples R China
基金
中国国家自然科学基金;
关键词
Infrared-visible fusion; Saliency-aware masking; Hybrid CNN transformer; Progressive feature fusion; Self-supervised learning; NETWORK;
D O I
10.1016/j.inffus.2024.102841
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The objective of infrared-visible image fusion (IVIF) is to generate composite images from multiple modalities that enhance visual representation and support advanced vision tasks. However, most existing IVIF methods primarily focus on enhancing visual effects, while high-level task-driven approaches are constrained by specific perception networks and complex training strategies, leading to limited flexibility across diverse scenarios. Emerging as a powerful self-supervised training paradigm, masked image modeling enables the learning of robust feature representations applicable to various downstream tasks. For this purpose, this study introduces SMAE-Fusion, a saliency-aware masked autoencoder framework tailored for infrared-visible image fusion. Initially, SMAE-Fusion adopts a saliency-aware dynamic masking strategy and utilizes the self-supervised pre-training paradigm in reconstruction stage to adaptively emphasize salient regions and semantic details, thereby improving feature representation to mitigate the semantic gap between upstream and downstream tasks. Moreover, the backbone of SMAE-Fusion incorporates the hybrid attention-enhanced transformer to promote effective interaction between local and global features by leveraging both convolutional and self- attention mechanisms. Additionally, a progressive feature fusion module is constructed to gradually optimize the integration of cross-modal features through self-attention alignment and cross-attention complementation. Comprehensive experiments on various public datasets demonstrate that SMAE-Fusion attains state-of-the-art performance in fusion quality and downstream task enhancement.
引用
收藏
页数:15
相关论文
共 50 条
  • [21] PAIF: Perception-Aware Infrared-Visible Image Fusion for Attack-Tolerant Semantic Segmentation
    Liu, Zhu
    Liu, Jinyuan
    Zhang, Benzhuang
    Ma, Long
    Fan, Xin
    Liu, Risheng
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 3706 - 3714
  • [22] Infrared and Visible Image Fusion Based on Visual Saliency and NSCT
    Fu, Zhi-Zhong
    Wang, Xue
    Li, Xiao-Feng
    Xu, Jin
    Dianzi Keji Daxue Xuebao/Journal of the University of Electronic Science and Technology of China, 2017, 46 (02): : 357 - 362
  • [23] Contrast Saliency Information Guided Infrared and Visible Image Fusion
    Wang, Xue
    Guan, Zheng
    Qian, Wenhua
    Cao, Jinde
    Wang, Chengchao
    Yang, Chao
    IEEE TRANSACTIONS ON COMPUTATIONAL IMAGING, 2023, 9 : 769 - 780
  • [24] Visible and infrared image fusion based on visual saliency detection
    Tan, Xizi
    Guo, Liqiang
    2020 19TH INTERNATIONAL SYMPOSIUM ON DISTRIBUTED COMPUTING AND APPLICATIONS FOR BUSINESS ENGINEERING AND SCIENCE (DCABES 2020), 2020, : 134 - 137
  • [25] ITFuse: An interactive transformer for infrared and visible image fusion
    Tang, Wei
    He, Fazhi
    Liu, Yu
    PATTERN RECOGNITION, 2024, 156
  • [26] Semantic perceptive infrared and visible image fusion Transformer
    Yang, Xin
    Huo, Hongtao
    Li, Chang
    Liu, Xiaowen
    Wang, Wenxi
    Wang, Cheng
    PATTERN RECOGNITION, 2024, 149
  • [27] MATCNN: Infrared and Visible Image Fusion Method Based on Multiscale CNN With Attention Transformer
    Liu, Jingjing
    Zhang, Li
    Zeng, Xiaoyang
    Liu, Wanquan
    Zhang, Jianhua
    IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, 2025, 74
  • [28] An infrared-visible image fusion scheme based on NSCT and compressed sensing
    Zhang, Qiong
    Maldague, Xavier
    SIGNAL PROCESSING, SENSOR/INFORMATION FUSION, AND TARGET RECOGNITION XXIV, 2015, 9474
  • [29] VMDM-fusion: a saliency feature representation method for infrared and visible image fusion
    Yong Yang
    Jia-Xiang Liu
    Shu-Ying Huang
    Hang-Yuan Lu
    Wen-Ying Wen
    Signal, Image and Video Processing, 2021, 15 : 1221 - 1229
  • [30] VMDM-fusion: a saliency feature representation method for infrared and visible image fusion
    Yang, Yong
    Liu, Jia-Xiang
    Huang, Shu-Ying
    Lu, Hang-Yuan
    Wen, Wen-Ying
    SIGNAL IMAGE AND VIDEO PROCESSING, 2021, 15 (06) : 1221 - 1229