TFIV: Multigrained Token Fusion for Infrared and Visible Image via Transformer

被引:5
|
作者
Li, Jing [1 ]
Yang, Bin [2 ]
Bai, Lu [3 ,4 ]
Dou, Hao [5 ]
Li, Chang [6 ]
Ma, Lingfei [7 ]
机构
[1] Cent Univ Finance & Econ, Sch Informat, Beijing 102206, Peoples R China
[2] Hunan Univ, Coll Elect & Informat Engn, Changsha 410082, Peoples R China
[3] Beijing Normal Univ, Sch Artificial Intelligence, Beijing 100875, Peoples R China
[4] Cent Univ Finance & Econ, Beijing 100081, Peoples R China
[5] China Elect Technol Grp Corp, Res Inst 38, Hefei 230088, Peoples R China
[6] Hefei Univ Technol, Dept Biomed Engn, Hefei 230009, Peoples R China
[7] Cent Univ Finance & Econ, Sch Stat & Math, Beijing 102206, Peoples R China
基金
中国国家自然科学基金;
关键词
Image fusion; infrared image; transformer; visible image; MULTI-FOCUS; NETWORK; FRAMEWORK;
D O I
10.1109/TIM.2023.3312755
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
The existing transformer-based infrared and visible image fusion methods mainly focus on the self-attention correlation existing in the intra-modal of each image; yet these methods neglect the discrepancies of inter-modal in the same position of two source images, because the information of infrared token and visible token in the same position is unbalanced. Therefore, we develop a pure transformer fusion model to reconstruct fused image in token dimension, which not only perceives the long-range dependencies in intra-modal by self-attention mechanism of the transformer, but also captures the attentive correlation of inter-modal in token space. Moreover, to enhance and balance the interaction of inter-modal tokens when we fuse the corresponding infrared and visible tokens, learnable attentive weights are applied to dynamically measure the correlation of inter-modal tokens in the same position. Concretely, infrared and visible tokens are first calculated by two independent transformers to extract long-range dependencies in intra-modal due to their modal difference. Then, we fuse the corresponding infrared and visible tokens of inter-modal in token space to reconstruct the fused image. In addition, to comprehensively extract multiscale long-range dependencies and capture attentive correlation of corresponding multimodal tokens in different token sizes, we explore and extend the fusion to multigrained token-based fusion. Ablation studies and extensive experiments illustrate the effectiveness and superiorities of our model when compared with nine state-of-the-art methods.
引用
下载
收藏
页数:14
相关论文
共 50 条
  • [1] PTET: A progressive token exchanging transformer for infrared and visible image fusion
    Huang, Jun
    Chen, Ziang
    Ma, Yong
    Fan, Fan
    Tang, Linfeng
    Xiang, Xinyu
    IMAGE AND VISION COMPUTING, 2024, 144
  • [2] Multigrained Attention Network for Infrared and Visible Image Fusion
    Li, Jing
    Huo, Hongtao
    Li, Chang
    Wang, Renhua
    Sui, Chenhong
    Liu, Zhao
    IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, 2021, 70
  • [3] DATFuse: Infrared and Visible Image Fusion via Dual Attention Transformer
    Tang, Wei
    He, Fazhi
    Liu, Yu
    Duan, Yansong
    Si, Tongzhen
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2023, 33 (07) : 3159 - 3172
  • [4] AITFuse: Infrared and visible image fusion via adaptive interactive transformer learning
    Wang, Zhishe
    Yang, Fan
    Sun, Jing
    Xu, Jiawei
    Yang, Fengbao
    Yan, Xiaomei
    KNOWLEDGE-BASED SYSTEMS, 2024, 299
  • [5] Semantic perceptive infrared and visible image fusion Transformer
    Yang, Xin
    Huo, Hongtao
    Li, Chang
    Liu, Xiaowen
    Wang, Wenxi
    Wang, Cheng
    PATTERN RECOGNITION, 2024, 149
  • [6] ITFuse: An interactive transformer for infrared and visible image fusion
    Tang, Wei
    He, Fazhi
    Liu, Yu
    PATTERN RECOGNITION, 2024, 156
  • [7] YDTR: Infrared and Visible Image Fusion via Y-Shape Dynamic Transformer
    Tang, Wei
    He, Fazhi
    Liu, Yu
    IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 5413 - 5428
  • [8] Infrared and Visible Image Fusion with Convolutional Neural Network and Transformer
    Yang, Yang
    Ren, Zhennan
    Li, Beichen
    LASER & OPTOELECTRONICS PROGRESS, 2023, 60 (16)
  • [9] MFT: Multi-scale Fusion Transformer for Infrared and Visible Image Fusion
    Zhang, Chen-Ming
    Yuan, Chengbo
    Luo, Yong
    Zhou, Xin
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING, ICANN 2023, PT VI, 2023, 14259 : 485 - 496
  • [10] Infrared and visible image fusion via gradientlet filter
    Ma, Jiayi
    Zhou, Yi
    COMPUTER VISION AND IMAGE UNDERSTANDING, 2020, 197