Cross-Parallel Attention and Efficient Match Transformer for Aerial Tracking

被引:1
|
作者
Deng, Anping [1 ,2 ]
Han, Guangliang [1 ]
Zhang, Zhongbo [3 ]
Chen, Dianbing [1 ]
Ma, Tianjiao [1 ]
Liu, Zhichao [1 ,2 ]
机构
[1] Chinese Acad Sci, Changchun Inst Opt Fine Mech & Phys CIOMP, Changchun 130033, Peoples R China
[2] Univ Chinese Acad Sci, Beijing 101408, Peoples R China
[3] Jilin Univ, Sch Math, Changchun 130012, Peoples R China
关键词
visual object tracking; UAV tracking; efficient match transformer; attention method;
D O I
10.3390/rs16060961
中图分类号
X [环境科学、安全科学];
学科分类号
08 ; 0830 ;
摘要
Visual object tracking is a key technology that is used in unmanned aerial vehicles (UAVs) to achieve autonomous navigation. In recent years, with the rapid development of deep learning, tracking algorithms based on Siamese neural networks have received widespread attention. However, because of complex and diverse tracking scenarios, as well as limited computational resources, most existing tracking algorithms struggle to ensure real-time stable operation while improving tracking performance. Therefore, studying efficient and fast-tracking frameworks, and enhancing the ability of algorithms to respond to complex scenarios has become crucial. Therefore, this paper proposes a cross-parallel attention and efficient match transformer for aerial tracking (SiamEMT). Firstly, we carefully designed the cross-parallel attention mechanism to encode global feature information and to achieve cross-dimensional interaction and feature correlation aggregation via parallel branches, highlighting feature saliency and reducing global redundancy information, as well as improving the tracking algorithm's ability to distinguish between targets and backgrounds. Meanwhile, we implemented an efficient match transformer to achieve feature matching. This network utilizes parallel, lightweight, multi-head attention mechanisms to pass template information to the search region features, better matching the global similarity between the template and search regions, and improving the algorithm's ability to perceive target location and feature information. Experiments on multiple drone public benchmark tests verified the accuracy and robustness of the proposed tracker in drone tracking scenarios. In addition, on the embedded artificial intelligence (AI) platform AGX Xavier, our algorithm achieved real-time tracking speed, indicating that our algorithm can be effectively applied to UAV tracking scenarios.
引用
收藏
页数:21
相关论文
共 50 条
  • [31] SiamHOT: Siamese High-Order Transformer for Aerial Tracking
    Chen, Qiqi
    Zuo, Yujia
    Wang, Bo
    Liu, Jinghong
    Liu, Chenglong
    IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2024, 21 : 1 - 5
  • [32] Visual tracking using transformer with a combination of convolution and attention
    Wang, Yuxuan
    Yan, Liping
    Feng, Zihang
    Xia, Yuanqing
    Xiao, Bo
    IMAGE AND VISION COMPUTING, 2023, 137
  • [33] Transformer Tracking Algorithm Integrating Fast Edge Attention
    Xue, Zihan
    Ge, Haibo
    Wang, Shuxian
    An, Yu
    Yang, Yudi
    Computer Engineering and Applications, 2025, 61 (01) : 221 - 231
  • [34] PatchFormer: An Efficient Point Transformer with Patch Attention
    Zhang, Cheng
    Wan, Haocheng
    Shen, Xinyi
    Wu, Zizhao
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 11789 - 11798
  • [35] An Efficient Transformer with Distance-aware Attention
    Duan, Gaoxiang
    Zheng, Xiaoying
    Zhu, Yongxin
    Ren, Tao
    Yan, Yan
    2023 IEEE 9TH INTL CONFERENCE ON BIG DATA SECURITY ON CLOUD, BIGDATASECURITY, IEEE INTL CONFERENCE ON HIGH PERFORMANCE AND SMART COMPUTING, HPSC AND IEEE INTL CONFERENCE ON INTELLIGENT DATA AND SECURITY, IDS, 2023, : 96 - 101
  • [36] Cross-Attention Transformer for Video Interpolation
    Kim, Hannah Halin
    Yu, Shuzhi
    Yuan, Shuai
    Tomasi, Carlo
    COMPUTER VISION - ACCV 2022 WORKSHOPS, 2023, 13848 : 325 - 342
  • [37] Cross Attention with Monotonic Alignment for Speech Transformer
    Zhao, Yingzhu
    Ni, Chongjia
    Leung, Cheung-Chi
    Joty, Shafiq
    Chng, Eng Siong
    Ma, Bin
    INTERSPEECH 2020, 2020, : 5031 - 5035
  • [38] Cross on Cross Attention: Deep Fusion Transformer for Image Captioning
    Zhang, Jing
    Xie, Yingshuai
    Ding, Weichao
    Wang, Zhe
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2023, 33 (08) : 4257 - 4268
  • [39] Resource-Efficient RGBD Aerial Tracking
    Yang, Jinyu
    Gao, Shang
    Li, Zhe
    Zheng, Feng
    Leonardis, Ales
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 13374 - 13383
  • [40] Hybrid Ladder Transformers with Efficient Parallel-Cross Attention for Medical Image Segmentation
    Luo Haozhe
    Yu Changdong
    Selvan, Raghavendra
    INTERNATIONAL CONFERENCE ON MEDICAL IMAGING WITH DEEP LEARNING, VOL 172, 2022, 172 : 808 - 819