SMAE-Fusion: Integrating saliency-aware masked autoencoder with hybrid attention transformer for infrared-visible image fusion

被引:0
|
作者
Wang, Qinghua [1 ]
Li, Ziwei [1 ,2 ,4 ]
Zhang, Shuqi [1 ]
Luo, Yuhong [1 ]
Chen, Wentao [1 ]
Wang, Tianyun [1 ]
Chi, Nan [1 ,2 ,3 ]
Dai, Qionghai [1 ,5 ]
机构
[1] Fudan Univ, Sch Informat Sci & Technol, Shanghai 200433, Peoples R China
[2] Fudan Univ, Shanghai ERC LEO Satellite Commun & Applicat, Shanghai CIC LEO Satellite Commun Technol, Shanghai 200433, Peoples R China
[3] Shanghai Collaborat Innovat Ctr Low Earth Orbit Sa, Shanghai 200433, Peoples R China
[4] Shanghai Artificial Intelligence Lab, Shanghai 200232, Peoples R China
[5] Tsinghua Univ, Dept Automat, Beijing 100084, Peoples R China
基金
中国国家自然科学基金;
关键词
Infrared-visible fusion; Saliency-aware masking; Hybrid CNN transformer; Progressive feature fusion; Self-supervised learning; NETWORK;
D O I
10.1016/j.inffus.2024.102841
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The objective of infrared-visible image fusion (IVIF) is to generate composite images from multiple modalities that enhance visual representation and support advanced vision tasks. However, most existing IVIF methods primarily focus on enhancing visual effects, while high-level task-driven approaches are constrained by specific perception networks and complex training strategies, leading to limited flexibility across diverse scenarios. Emerging as a powerful self-supervised training paradigm, masked image modeling enables the learning of robust feature representations applicable to various downstream tasks. For this purpose, this study introduces SMAE-Fusion, a saliency-aware masked autoencoder framework tailored for infrared-visible image fusion. Initially, SMAE-Fusion adopts a saliency-aware dynamic masking strategy and utilizes the self-supervised pre-training paradigm in reconstruction stage to adaptively emphasize salient regions and semantic details, thereby improving feature representation to mitigate the semantic gap between upstream and downstream tasks. Moreover, the backbone of SMAE-Fusion incorporates the hybrid attention-enhanced transformer to promote effective interaction between local and global features by leveraging both convolutional and self- attention mechanisms. Additionally, a progressive feature fusion module is constructed to gradually optimize the integration of cross-modal features through self-attention alignment and cross-attention complementation. Comprehensive experiments on various public datasets demonstrate that SMAE-Fusion attains state-of-the-art performance in fusion quality and downstream task enhancement.
引用
收藏
页数:15
相关论文
共 50 条
  • [31] Infrared and Visible Image Fusion with Hybrid Image Filtering
    Zhang, Yongxin
    Li, Deguang
    Zhu, WenPeng
    MATHEMATICAL PROBLEMS IN ENGINEERING, 2020, 2020
  • [32] Semantic-Aware Infrared and Visible Image Fusion
    Zhou, Wenhao
    Wu, Wei
    Zhou, Huabing
    2021 4TH INTERNATIONAL CONFERENCE ON ROBOTICS, CONTROL AND AUTOMATION ENGINEERING (RCAE 2021), 2021, : 82 - 85
  • [33] Multiscale aggregation and illumination-aware attention network for infrared and visible image fusion
    Song, Wenhao
    Zhai, Wenzhe
    Gao, Mingliang
    Li, Qilei
    Chehri, Abdellah
    Jeon, Gwanggil
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2024, 36 (10):
  • [34] Infrared and visible image fusion algorithm based on progressive difference-aware attention
    Li X.
    Feng Y.
    Zhang Y.
    Zhongguo Kexue Jishu Kexue/Scientia Sinica Technologica, 2024, 54 (06): : 1183 - 1197
  • [35] MFT: Multi-scale Fusion Transformer for Infrared and Visible Image Fusion
    Zhang, Chen-Ming
    Yuan, Chengbo
    Luo, Yong
    Zhou, Xin
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING, ICANN 2023, PT VI, 2023, 14259 : 485 - 496
  • [36] Multigrained Attention Network for Infrared and Visible Image Fusion
    Li, Jing
    Huo, Hongtao
    Li, Chang
    Wang, Renhua
    Sui, Chenhong
    Liu, Zhao
    IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, 2021, 70
  • [37] Infrared and Visible Image Fusion based on Saliency Detection and Infrared Target Segment
    Li, Jun
    Song, Minghui
    Peng, Yuanxi
    2ND INTERNATIONAL CONFERENCE ON COMPUTER ENGINEERING, INFORMATION SCIENCE AND INTERNET TECHNOLOGY, CII 2017, 2017, : 21 - 30
  • [38] THFuse: An infrared and visible image fusion network using transformer and hybrid feature extractor
    Chen, Jun
    Ding, Jianfeng
    Yu, Yang
    Gong, Wenping
    NEUROCOMPUTING, 2023, 527 : 71 - 82
  • [39] An improved hybrid multiscale fusion algorithm based on NSST for infrared-visible images
    Hu, Peng
    Wang, Chenjun
    Li, Dequan
    Zhao, Xin
    VISUAL COMPUTER, 2023,
  • [40] Infrared-visible Image Fusion Using Accelerated Convergent Convolutional Dictionary Learning
    Chengfang Zhang
    Ziliang Feng
    Arabian Journal for Science and Engineering, 2022, 47 : 10295 - 10306