Adaptive Multi-Scale Transformer Tracker for Satellite Videos

被引:0
|
作者
Zhang, Xin [1 ]
Jiao, Licheng [1 ]
Li, Lingling [1 ]
Liu, Xu [1 ]
Liu, Fang [1 ]
Ma, Wenping [1 ]
Yang, Shuyuan [1 ]
机构
[1] Xidian Univ, Int Res Ctr Intelligent Percept & Computat, Sch Artificial Intelligence,Minist Educ China, Key Lab Intelligent Percept & Image Understanding,, Xian 710071, Peoples R China
基金
中国国家自然科学基金;
关键词
Feature extraction; Transformers; Satellites; Target tracking; Videos; Video tracking; Computational modeling; Adaptive Transformer; multi-scale Transformer (MT); object regression; satellite video tracking; OBJECT TRACKING;
D O I
10.1109/TGRS.2024.3441038
中图分类号
P3 [地球物理学]; P59 [地球化学];
学科分类号
0708 ; 070902 ;
摘要
Satellite video tracking tasks are often characterized by blurred foreground boundaries in vast scenes, a wide range of targets varying in scale, and irregular changes in appearance. These challenges significantly impact the optimization of robust tracker performance. Therefore, it is imperative to extract diverse features with dynamic adaptive learning capabilities for the target being tracked in each sequence. In this article, we explore a novel adaptive multi-scale Transformer (MT) tracker for satellite videos to explore the potential spatiotemporal information of the target effectively. Specifically, a multi-scale spatial Transformer (MSST) is designed to leverage stage-by-stage spatial reduction and channel doubling, thereby enhancing the representation capabilities for the tracked target. In dynamic feature learning, an adaptive temporal Transformer (ATT) is then introduced based on multiple cross attentions, which analyzes the adaptive learning capacity for the dynamic target. It analyzes the weight proportion of different attentions automatically in the specific sequence through the learnable parameters. Finally, a multi-scale feature (MSF) regression module is crafted to improve the positioning accuracy of targets with low pixel counts in satellite scenes. This module accomplishes precise annotation of target boxes by effectively fusing features from diverse stages. We evaluate the proposed tracker performance on several public satellite datasets, including SatSOT, SV248S, and VISO. Experimental results show that the performance of our model can be comparable to the state-of-the-art trackers.
引用
收藏
页数:16
相关论文
共 50 条
  • [41] Multi-scale adaptive networks for efficient inference
    Linfeng Li
    Weixing Su
    Fang Liu
    Maowei He
    Xiaodan Liang
    International Journal of Machine Learning and Cybernetics, 2024, 15 : 267 - 282
  • [42] Multi-scale Adaptive Computational Ghost Imaging
    Sun, Shuai
    Liu, Wei-Tao
    Lin, Hui-Zu
    Zhang, Er-Feng
    Liu, Ji-Ying
    Li, Quan
    Chen, Ping-Xing
    SCIENTIFIC REPORTS, 2016, 6
  • [43] Multi-Scale Blobs for Saliency Detection in Satellite Images
    Zhou, Yanan
    Luo, Jiancheng
    Hu, Xiaodong
    Shen, Zhanfeng
    JOURNAL OF THE INDIAN SOCIETY OF REMOTE SENSING, 2016, 44 (02) : 159 - 166
  • [44] Multi-Scale Blobs for Saliency Detection in Satellite Images
    Yanan Zhou
    Jiancheng Luo
    Xiaodong Hu
    Zhanfeng Shen
    Journal of the Indian Society of Remote Sensing, 2016, 44 : 159 - 166
  • [45] A scale-adaptive DEM for multi-scale terrain analysis
    Chen, Yumin
    Zhou, Qiming
    INTERNATIONAL JOURNAL OF GEOGRAPHICAL INFORMATION SCIENCE, 2013, 27 (07) : 1329 - 1348
  • [46] Multi-Scale Feature Attention-DEtection TRansformer: Multi-Scale Feature Attention for security check object detection
    Sima, Haifeng
    Chen, Bailiang
    Tang, Chaosheng
    Zhang, Yudong
    Sun, Junding
    IET COMPUTER VISION, 2024, 18 (05) : 613 - 625
  • [47] Deep Multi-Scale Transformer for Remote Sensing Image Restoration
    Li, Yanting
    2024 5TH INTERNATIONAL CONFERENCE ON GEOLOGY, MAPPING AND REMOTE SENSING, ICGMRS 2024, 2024, : 138 - 142
  • [48] Accurate Facial Landmark Detector via Multi-scale Transformer
    Sha, Yuyang
    Meng, Weiyu
    Zhai, Xiaobing
    Xie, Can
    Li, Kefeng
    PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2023, PT V, 2024, 14429 : 278 - 290
  • [49] Multi-scale Dilated Convolution Transformer for Single Image Deraining
    Wu, Xianhao
    JiyangLu
    Wu, Jindi
    Li, Yufeng
    2023 IEEE 25TH INTERNATIONAL WORKSHOP ON MULTIMEDIA SIGNAL PROCESSING, MMSP, 2023,
  • [50] Residual Transformer YOLO for Detecting Multi-Scale Crowded Pedestrian
    Ye, Hechao
    Wang, Yanni
    APPLIED SCIENCES-BASEL, 2023, 13 (21):