Multi-scale feature extraction and fusion with attention interaction for RGB-T

被引:0
|
作者
Xing, Haijiao [1 ]
Wei, Wei [1 ]
Zhang, Lei [1 ]
Zhang, Yanning [1 ]
机构
[1] Northwestern Polytech Univ, Sch Comp Sci, Xian 710072, Peoples R China
基金
中国国家自然科学基金;
关键词
Single-object tracking; RGB-T tracking; Feature fusion; SIAMESE NETWORKS; TRACKING;
D O I
10.1016/j.patcog.2024.110917
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
RGB-T single-object tracking aims to track objects utilizing both RGB images and thermal infrared(TIR) images. Though the siamese-based RGB-T tracker shows its advantage in tracking speed, its accuracy still cannot be compared with other state-of-the-art trackers (e.g., MDNet). In this study, we revisit the existing siamese-based RGB-T tracker and find that such fall behind comes from insufficient feature fusion between RGB image and TIR image, as well as incomplete interactions between template frame and search frame. Inspired by this, we propose a multi-scale feature extraction and fusion network with Temporal-Spatial Memory (MFATrack). Instead of fusing RGB image and TIR image with the single-scale feature map or only high-level features from the multi-scale feature map, MFATrack proposes a new fusion strategy by fusing features from all scales, which can capture contextual information in shallow layers and details in the deep layer. To learn the feature better for tracking tasks, MFATrack fuses the features via several consecutive frames. In addition, we also propose a self-attention interaction module specifically designed for the search frame, highlighting the features in the search frame that are relevant to the target and thus facilitating rapid convergence for target localization. Experimental results demonstrate the proposed MFATrack is not only fast, but also can obtain better tracking accuracy compared with other competing methods including MDNet-based methods and other siamese-based trackers.
引用
收藏
页数:12
相关论文
共 50 条
  • [1] Multi-modal neural networks with multi-scale RGB-T fusion for semantic segmentation
    Lyu, Y.
    Schiopu, I.
    Munteanu, A.
    ELECTRONICS LETTERS, 2020, 56 (18) : 920 - 922
  • [2] Pedestrian detection algorithm based on multi-scale feature extraction and attention feature fusion
    Xia, Hao
    Ma, Jun
    Ou, Jiayu
    Lv, Xinyao
    Bai, Chengjie
    DIGITAL SIGNAL PROCESSING, 2022, 121
  • [3] MSEDNet: Multi-scale fusion and edge-supervised network for RGB-T salient object detection
    Peng, Daogang
    Zhou, Weiyi
    Pan, Junzhen
    Wang, Danhao
    NEURAL NETWORKS, 2024, 171 : 410 - 422
  • [4] Attention interaction based RGB-T tracking method
    Wang W.
    Fu F.
    Lei H.
    Tang Z.
    Guangxue Jingmi Gongcheng/Optics and Precision Engineering, 2024, 32 (03): : 435 - 444
  • [5] SSD with multi-scale feature fusion and attention mechanism
    Liu, Qiang
    Dong, Lijun
    Zeng, Zhigao
    Zhu, Wenqiu
    Zhu, Yanhui
    Meng, Chen
    SCIENTIFIC REPORTS, 2023, 13 (01):
  • [6] SSD with multi-scale feature fusion and attention mechanism
    Qiang Liu
    Lijun Dong
    Zhigao Zeng
    Wenqiu Zhu
    Yanhui Zhu
    Chen Meng
    Scientific Reports, 13 (1)
  • [7] Unsupervised RGB-T object tracking with attentional multi-modal feature fusion
    Shenglan Li
    Rui Yao
    Yong Zhou
    Hancheng Zhu
    Bing Liu
    Jiaqi Zhao
    Zhiwen Shao
    Multimedia Tools and Applications, 2023, 82 : 23595 - 23613
  • [8] FEATURE ENHANCEMENT AND FUSION FOR RGB-T SALIENT OBJECT DETECTION
    Sun, Fengming
    Zhang, Kang
    Yuan, Xia
    Zhao, Chunxia
    2023 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2023, : 1300 - 1304
  • [9] Multi-Scale Feature Fusion Attention Network for Building Extraction in Remote Sensing Images
    Liu, Jia
    Gu, Hang
    Li, Zuhe
    Chen, Hongyang
    Chen, Hao
    ELECTRONICS, 2024, 13 (05)
  • [10] Revisiting Feature Fusion for RGB-T Salient Object Detection
    Zhang, Qiang
    Xiao, Tonglin
    Huang, Nianchang
    Zhang, Dingwen
    Han, Jungong
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2021, 31 (05) : 1804 - 1818