Adaptive Multi-Scale Transformer Tracker for Satellite Videos

被引:0
|
作者
Zhang, Xin [1 ]
Jiao, Licheng [1 ]
Li, Lingling [1 ]
Liu, Xu [1 ]
Liu, Fang [1 ]
Ma, Wenping [1 ]
Yang, Shuyuan [1 ]
机构
[1] Xidian Univ, Int Res Ctr Intelligent Percept & Computat, Sch Artificial Intelligence,Minist Educ China, Key Lab Intelligent Percept & Image Understanding,, Xian 710071, Peoples R China
基金
中国国家自然科学基金;
关键词
Feature extraction; Transformers; Satellites; Target tracking; Videos; Video tracking; Computational modeling; Adaptive Transformer; multi-scale Transformer (MT); object regression; satellite video tracking; OBJECT TRACKING;
D O I
10.1109/TGRS.2024.3441038
中图分类号
P3 [地球物理学]; P59 [地球化学];
学科分类号
0708 ; 070902 ;
摘要
Satellite video tracking tasks are often characterized by blurred foreground boundaries in vast scenes, a wide range of targets varying in scale, and irregular changes in appearance. These challenges significantly impact the optimization of robust tracker performance. Therefore, it is imperative to extract diverse features with dynamic adaptive learning capabilities for the target being tracked in each sequence. In this article, we explore a novel adaptive multi-scale Transformer (MT) tracker for satellite videos to explore the potential spatiotemporal information of the target effectively. Specifically, a multi-scale spatial Transformer (MSST) is designed to leverage stage-by-stage spatial reduction and channel doubling, thereby enhancing the representation capabilities for the tracked target. In dynamic feature learning, an adaptive temporal Transformer (ATT) is then introduced based on multiple cross attentions, which analyzes the adaptive learning capacity for the dynamic target. It analyzes the weight proportion of different attentions automatically in the specific sequence through the learnable parameters. Finally, a multi-scale feature (MSF) regression module is crafted to improve the positioning accuracy of targets with low pixel counts in satellite scenes. This module accomplishes precise annotation of target boxes by effectively fusing features from diverse stages. We evaluate the proposed tracker performance on several public satellite datasets, including SatSOT, SV248S, and VISO. Experimental results show that the performance of our model can be comparable to the state-of-the-art trackers.
引用
收藏
页数:16
相关论文
共 50 条
  • [1] Multi-Scale Adaptive Skeleton Transformer for action
    Wang, Xiaotian
    Chen, Kai
    Zhao, Zhifu
    Shi, Guangming
    Xie, Xuemei
    Jiang, Xiang
    Yang, Yifan
    COMPUTER VISION AND IMAGE UNDERSTANDING, 2025, 250
  • [2] An efficient multi-scale transformer for satellite image dehazing
    Yang, Lei
    Cao, Jianzhong
    Chen, Weining
    Wang, Hao
    He, Lang
    EXPERT SYSTEMS, 2024, 41 (08)
  • [3] Micro-expression spotting with multi-scale local transformer in long videos
    Guo, Xupeng
    Zhang, Xiaobiao
    Li, Lei
    Xia, Zhaoqiang
    PATTERN RECOGNITION LETTERS, 2023, 168 : 146 - 152
  • [4] LMTformer: facial depression recognition with lightweight multi-scale transformer from videos
    He, Lang
    Zhao, Junnan
    Zhang, Jie
    Jiang, Jiewei
    Qi, Senqing
    Wang, Zhongmin
    Wu, Di
    APPLIED INTELLIGENCE, 2025, 55 (02)
  • [5] A Fast Adaptive Multi-Scale Kernel Correlation Filter Tracker for Rigid Object
    Zheng, Kaiyuan
    Zhang, Zhiyong
    Qiu, Changzhen
    SENSORS, 2022, 22 (20)
  • [6] MSATNet: multi-scale adaptive transformer network for motor imagery classification
    Hu, Lingyan
    Hong, Weijie
    Liu, Lingyu
    FRONTIERS IN NEUROSCIENCE, 2023, 17
  • [7] Multi-scale patch transformer with adaptive decomposition for carbon emissions forecasting
    Li, Xiang
    Chu, Lei
    Li, Yujun
    Ding, Fengqian
    Quan, Zhenzhen
    Qu, Fangx
    Xing, Zhanjun
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2025, 146
  • [8] An adaptive n-gram transformer for multi-scale scene text recognition
    Yan, Xueming
    Fang, Zhihang
    Jin, Yaochu
    KNOWLEDGE-BASED SYSTEMS, 2023, 280
  • [9] MUSIQ: Multi-scale Image Quality Transformer
    Ke, Junjie
    Wang, Qifei
    Wang, Yilin
    Milanfar, Peyman
    Yang, Feng
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 5128 - 5137
  • [10] Human pose estimation in complex background videos via Transformer-based multi-scale feature integration
    Cheng, Chen
    Xu, Huahu
    DISPLAYS, 2024, 84