SMAE-Fusion: Integrating saliency-aware masked autoencoder with hybrid attention transformer for infrared-visible image fusion

被引:0
|
作者
Wang, Qinghua [1 ]
Li, Ziwei [1 ,2 ,4 ]
Zhang, Shuqi [1 ]
Luo, Yuhong [1 ]
Chen, Wentao [1 ]
Wang, Tianyun [1 ]
Chi, Nan [1 ,2 ,3 ]
Dai, Qionghai [1 ,5 ]
机构
[1] Fudan Univ, Sch Informat Sci & Technol, Shanghai 200433, Peoples R China
[2] Fudan Univ, Shanghai ERC LEO Satellite Commun & Applicat, Shanghai CIC LEO Satellite Commun Technol, Shanghai 200433, Peoples R China
[3] Shanghai Collaborat Innovat Ctr Low Earth Orbit Sa, Shanghai 200433, Peoples R China
[4] Shanghai Artificial Intelligence Lab, Shanghai 200232, Peoples R China
[5] Tsinghua Univ, Dept Automat, Beijing 100084, Peoples R China
基金
中国国家自然科学基金;
关键词
Infrared-visible fusion; Saliency-aware masking; Hybrid CNN transformer; Progressive feature fusion; Self-supervised learning; NETWORK;
D O I
10.1016/j.inffus.2024.102841
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The objective of infrared-visible image fusion (IVIF) is to generate composite images from multiple modalities that enhance visual representation and support advanced vision tasks. However, most existing IVIF methods primarily focus on enhancing visual effects, while high-level task-driven approaches are constrained by specific perception networks and complex training strategies, leading to limited flexibility across diverse scenarios. Emerging as a powerful self-supervised training paradigm, masked image modeling enables the learning of robust feature representations applicable to various downstream tasks. For this purpose, this study introduces SMAE-Fusion, a saliency-aware masked autoencoder framework tailored for infrared-visible image fusion. Initially, SMAE-Fusion adopts a saliency-aware dynamic masking strategy and utilizes the self-supervised pre-training paradigm in reconstruction stage to adaptively emphasize salient regions and semantic details, thereby improving feature representation to mitigate the semantic gap between upstream and downstream tasks. Moreover, the backbone of SMAE-Fusion incorporates the hybrid attention-enhanced transformer to promote effective interaction between local and global features by leveraging both convolutional and self- attention mechanisms. Additionally, a progressive feature fusion module is constructed to gradually optimize the integration of cross-modal features through self-attention alignment and cross-attention complementation. Comprehensive experiments on various public datasets demonstrate that SMAE-Fusion attains state-of-the-art performance in fusion quality and downstream task enhancement.
引用
收藏
页数:15
相关论文
共 50 条
  • [1] Fast saliency-aware multi-modality image fusion
    Han, Jungong
    Pauwels, Eric J.
    de Zeeuw, Paul
    NEUROCOMPUTING, 2013, 111 : 70 - 80
  • [2] An interactively reinforced paradigm for joint infrared-visible image fusion and saliency object detection
    Wang, Di
    Liu, Jinyuan
    Liu, Risheng
    Fan, Xin
    INFORMATION FUSION, 2023, 98
  • [3] HBANet: A hybrid boundary-aware attention network for infrared and visible image fusion
    Luo, Xubo
    Zhang, Jinshuo
    Wang, Liping
    Niu, Dongmei
    COMPUTER VISION AND IMAGE UNDERSTANDING, 2024, 249
  • [4] HAIAFusion: A Hybrid Attention Illumination-Aware Framework for Infrared and Visible Image Fusion
    Sun, Yichen
    Dong, Mingli
    Yu, Mingxin
    Zhu, Lianqing
    IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, 2025, 74
  • [5] PFCFuse: A Poolformer and CNN Fusion Network for Infrared-Visible Image Fusion
    Hu, Xinyu
    Liu, Yang
    Yang, Feng
    IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, 2024, 73
  • [6] Infrared and Visible Image Fusion Based on Autoencoder Composed of CNN-Transformer
    Wang, Hongmei
    Li, Lin
    Li, Chenkai
    Lu, Xuanyu
    IEEE ACCESS, 2023, 11 : 78956 - 78969
  • [7] A Contrastive Learning Approach for Infrared-Visible Image Fusion
    Gupta, Ashish Kumar
    Barnwal, Meghna
    Mishra, Deepak
    PATTERN RECOGNITION AND MACHINE INTELLIGENCE, PREMI 2023, 2023, 14301 : 199 - 208
  • [8] Weber-aware weighted mutual information evaluation for infrared-visible image fusion
    Luo, Xiaoyan
    Wang, Shining
    Yuan, Ding
    JOURNAL OF APPLIED REMOTE SENSING, 2016, 10
  • [9] A Dual Cross Attention Transformer Network for Infrared and Visible Image Fusion
    Zhou, Zhuozhi
    Lan, Jinhui
    2024 7TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND BIG DATA, ICAIBD 2024, 2024, : 494 - 499
  • [10] DATFuse: Infrared and Visible Image Fusion via Dual Attention Transformer
    Tang, Wei
    He, Fazhi
    Liu, Yu
    Duan, Yansong
    Si, Tongzhen
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2023, 33 (07) : 3159 - 3172