Deeper Siamese network with multi-level feature fusion for real-time visual tracking

被引:11
|
作者
Yang, Kang [1 ,2 ,3 ]
Song, Huihui [1 ,2 ,3 ]
Zhang, Kaihua [1 ,2 ,3 ]
Fan, Jiaqing [1 ,2 ,3 ]
机构
[1] Nanjing Univ Informat Sci & Technol, B DAT, Nanjing, Jiangsu, Peoples R China
[2] Nanjing Univ Informat Sci & Technol, B DAT, Nanjing, Jiangsu, Peoples R China
[3] Nanjing Univ Informat Sci & Technol, CICAEET, Nanjing, Jiangsu, Peoples R China
关键词
feature extraction; image fusion; object tracking; image representation; neural nets; real-time systems; image resolution; deeper Siamese network; multilevel feature fusion; real-time visual tracking; SiamN-based trackers; target representation; tracking performance degeneration; deeper ResNet; feature map resolution; feature agglomeration module; visual object tracking;
D O I
10.1049/el.2019.1041
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
In recent years, using Siamese network (SiamN) for visual tracking has witnessed a great success in terms of accuracy and efficiency. Nevertheless, most SiamN-based trackers employ shallow network such as AlexNet to extract the top-layer features as target representation that are less discriminative, usually leading to tracking performance degeneration when suffering from large deformation and similar distractors. A straightforward idea to address this issue is to replace the backbone network of SiamN with deeper ResNet. However, this cannot boost performance much due to the low resolution of high-level feature maps with useful spatial details losing. To address this issue, the authors propose a lightweight yet effective feature agglomeration module (FAM) to adaptively fuse low-level and high-level features for robust tracking. Specifically, they first develop a generalised non-local attention module to enhance the discriminative capability of high-level semantic features. Then, they design an inception-like module to enhance the representative power of low-level features with more spatial details. Both types of features are then adaptively fused in the FAM to complement their characteristics. Extensive evaluations on OTB-2015 and VOT2017 challenge demonstrate that the proposed tracker consistently achieves favourable performance against several state-of-the-art trackers and runs at 50 fps.
引用
收藏
页码:742 / 744
页数:3
相关论文
共 50 条
  • [31] A Twofold Siamese Network for Real-Time Object Tracking
    He, Anfeng
    Luo, Chong
    Tian, Xinmei
    Zeng, Wenjun
    [J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 4834 - 4843
  • [32] A Siamese Network for real-time object tracking on CPU
    Xing, Daitao
    Evangeliou, Nikolaos
    Tsoukalas, Athanasios
    Tzes, Anthony
    [J]. SOFTWARE IMPACTS, 2022, 12
  • [33] MAFFNet: real-time multi-level attention feature fusion network with RGB-D semantic segmentation for autonomous driving
    Lv, Tongfei
    Zhang, Yu
    Luo, Lin
    Gao, Xiaorong
    [J]. APPLIED OPTICS, 2022, 61 (09) : 2219 - 2229
  • [34] SiamCAN: Real-Time Visual Tracking Based on Siamese Center-Aware Network
    Zhou, Wenzhang
    Wen, Longyin
    Zhang, Libo
    Du, Dawei
    Luo, Tiejian
    Wu, Yanjun
    [J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2021, 30 : 3597 - 3609
  • [35] SiamCAN: Real-Time Visual Tracking Based on Siamese Center-Aware Network
    Zhou, Wenzhang
    Wen, Longyin
    Zhang, Libo
    Du, Dawei
    Luo, Tiejian
    Wu, Yanjun
    [J]. IEEE Transactions on Image Processing, 2021, 30 : 3597 - 3609
  • [36] IoU-guided Siamese region proposal network for real-time visual tracking
    Zhou, Lifang
    He, Yu
    Li, Weisheng
    Mi, Jianxun
    Lei, Bangjun
    [J]. NEUROCOMPUTING, 2021, 462 : 544 - 554
  • [37] End-to-end feature fusion Siamese network for adaptive visual tracking
    Guo, Dongyan
    Wang, Jun
    Zhao, Weixuan
    Cui, Ying
    Wang, Zhenhua
    Chen, Shengyong
    [J]. IET IMAGE PROCESSING, 2021, 15 (01) : 91 - 100
  • [38] SIAMESE FEATURE PYRAMID NETWORK FOR VISUAL TRACKING
    Chang, Shuo
    Zhang, Fan
    Huang, Sai
    Yao, Yuanyuan
    Zhao, Xiaotong
    Feng, Zhiyong
    [J]. 2019 IEEE/CIC INTERNATIONAL CONFERENCE ON COMMUNICATIONS WORKSHOPS IN CHINA (ICCC WORKSHOPS), 2019, : 164 - 168
  • [39] Multi-level feature fusion network for crowd counting
    Wang, Luyang
    Li, Yun
    Peng, Sifan
    Tang, Xiao
    Yin, Baoqun
    [J]. IET COMPUTER VISION, 2021, 15 (01) : 60 - 72
  • [40] Extremely Tiny Siamese Networks with Multi-level Fusions for Visual Object Tracking
    Cao, Yi
    Ji, Hongbing
    Zhang, Wenbo
    Shirani, Shahram
    [J]. 2019 22ND INTERNATIONAL CONFERENCE ON INFORMATION FUSION (FUSION 2019), 2019,