Adaptive Multi-Scale Transformer Tracker for Satellite Videos

被引:0
|
作者
Zhang, Xin [1 ]
Jiao, Licheng [1 ]
Li, Lingling [1 ]
Liu, Xu [1 ]
Liu, Fang [1 ]
Ma, Wenping [1 ]
Yang, Shuyuan [1 ]
机构
[1] Xidian Univ, Int Res Ctr Intelligent Percept & Computat, Sch Artificial Intelligence,Minist Educ China, Key Lab Intelligent Percept & Image Understanding,, Xian 710071, Peoples R China
基金
中国国家自然科学基金;
关键词
Feature extraction; Transformers; Satellites; Target tracking; Videos; Video tracking; Computational modeling; Adaptive Transformer; multi-scale Transformer (MT); object regression; satellite video tracking; OBJECT TRACKING;
D O I
10.1109/TGRS.2024.3441038
中图分类号
P3 [地球物理学]; P59 [地球化学];
学科分类号
0708 ; 070902 ;
摘要
Satellite video tracking tasks are often characterized by blurred foreground boundaries in vast scenes, a wide range of targets varying in scale, and irregular changes in appearance. These challenges significantly impact the optimization of robust tracker performance. Therefore, it is imperative to extract diverse features with dynamic adaptive learning capabilities for the target being tracked in each sequence. In this article, we explore a novel adaptive multi-scale Transformer (MT) tracker for satellite videos to explore the potential spatiotemporal information of the target effectively. Specifically, a multi-scale spatial Transformer (MSST) is designed to leverage stage-by-stage spatial reduction and channel doubling, thereby enhancing the representation capabilities for the tracked target. In dynamic feature learning, an adaptive temporal Transformer (ATT) is then introduced based on multiple cross attentions, which analyzes the adaptive learning capacity for the dynamic target. It analyzes the weight proportion of different attentions automatically in the specific sequence through the learnable parameters. Finally, a multi-scale feature (MSF) regression module is crafted to improve the positioning accuracy of targets with low pixel counts in satellite scenes. This module accomplishes precise annotation of target boxes by effectively fusing features from diverse stages. We evaluate the proposed tracker performance on several public satellite datasets, including SatSOT, SV248S, and VISO. Experimental results show that the performance of our model can be comparable to the state-of-the-art trackers.
引用
收藏
页数:16
相关论文
共 50 条
  • [21] Seismic Data Interpolation Based on Multi-Scale Transformer
    Guo, Yuanqi
    Fu, Lihua
    Li, Hongwei
    IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2023, 20
  • [22] Gated Multi-Scale Transformer for Temporal Action Localization
    Yang, Jin
    Wei, Ping
    Ren, Ziyang
    Zheng, Nanning
    IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 5705 - 5717
  • [23] Multi-Scale Vision Transformer for Defect Object Detection
    Lou, Liangshan
    Lu, Ke
    Xue, Jian
    Procedia Computer Science, 2023, 222 : 397 - 406
  • [24] Transformer tracking with multi-scale dual-attention
    Wang, Jun
    Lai, Changwang
    Zhang, Wenshuang
    Wang, Yuanyun
    Meng, Chenchen
    COMPLEX & INTELLIGENT SYSTEMS, 2023, 9 (05) : 5793 - 5806
  • [25] Multi-Scale Transformer Network for Hyperspectral Image Denoising
    Hu, Shuai
    Hu, Yikun
    Lin, Junyan
    Gao, Feng
    Dong, Junyu
    IGARSS 2023 - 2023 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM, 2023, : 7328 - 7331
  • [26] Rethinking Multi-Scale Representations in Deep Deraining Transformer
    Chen, Hongming
    Chen, Xiang
    Lu, Jiyang
    Li, Yufeng
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 2, 2024, : 1046 - 1053
  • [27] MSTFDN: Multi-scale transformer fusion dehazing network
    Yan Yang
    Haowen Zhang
    Xudong Wu
    Xiaozhen Liang
    Applied Intelligence, 2023, 53 : 5951 - 5962
  • [28] MSTFDN: Multi-scale transformer fusion dehazing network
    Yang, Yan
    Zhang, Haowen
    Wu, Xudong
    Liang, Xiaozhen
    APPLIED INTELLIGENCE, 2023, 53 (05) : 5951 - 5962
  • [29] DilateFormer: Multi-Scale Dilated Transformer for Visual Recognition
    Jiao, Jiayu
    Tang, Yu-Ming
    Lin, Kun-Yu
    Gao, Yipeng
    Ma, Andy J.
    Wang, Yaowei
    Zheng, Wei-Shi
    IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 8906 - 8919
  • [30] Multi-scale transformer with conditioned prompt for image deraining
    Wu, Xianhao
    Chen, Hongming
    Chen, Xiang
    Xu, Guili
    DIGITAL SIGNAL PROCESSING, 2025, 156