Adaptive Multi-Scale Transformer Tracker for Satellite Videos

被引：0

作者：

Zhang, Xin ^{[1
]}

Jiao, Licheng ^{[1
]}

Li, Lingling ^{[1
]}

Liu, Xu ^{[1
]}

Liu, Fang ^{[1
]}

Ma, Wenping ^{[1
]}

Yang, Shuyuan ^{[1
]}

机构：

[1] Xidian Univ, Int Res Ctr Intelligent Percept & Computat, Sch Artificial Intelligence,Minist Educ China, Key Lab Intelligent Percept & Image Understanding,, Xian 710071, Peoples R China

来源：

IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING | 2024年 / 62卷

基金：

中国国家自然科学基金;

关键词：

Feature extraction; Transformers; Satellites; Target tracking; Videos; Video tracking; Computational modeling; Adaptive Transformer; multi-scale Transformer (MT); object regression; satellite video tracking; OBJECT TRACKING;

D O I：

10.1109/TGRS.2024.3441038

中图分类号：

P3 [地球物理学]; P59 [地球化学];

学科分类号：

0708 ; 070902 ;

摘要：

Satellite video tracking tasks are often characterized by blurred foreground boundaries in vast scenes, a wide range of targets varying in scale, and irregular changes in appearance. These challenges significantly impact the optimization of robust tracker performance. Therefore, it is imperative to extract diverse features with dynamic adaptive learning capabilities for the target being tracked in each sequence. In this article, we explore a novel adaptive multi-scale Transformer (MT) tracker for satellite videos to explore the potential spatiotemporal information of the target effectively. Specifically, a multi-scale spatial Transformer (MSST) is designed to leverage stage-by-stage spatial reduction and channel doubling, thereby enhancing the representation capabilities for the tracked target. In dynamic feature learning, an adaptive temporal Transformer (ATT) is then introduced based on multiple cross attentions, which analyzes the adaptive learning capacity for the dynamic target. It analyzes the weight proportion of different attentions automatically in the specific sequence through the learnable parameters. Finally, a multi-scale feature (MSF) regression module is crafted to improve the positioning accuracy of targets with low pixel counts in satellite scenes. This module accomplishes precise annotation of target boxes by effectively fusing features from diverse stages. We evaluate the proposed tracker performance on several public satellite datasets, including SatSOT, SV248S, and VISO. Experimental results show that the performance of our model can be comparable to the state-of-the-art trackers.

引用

页数：16

共 50 条

[21] Seismic Data Interpolation Based on Multi-Scale Transformer
Guo, Yuanqi
Fu, Lihua
Li, Hongwei
IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2023, 20
[22] Gated Multi-Scale Transformer for Temporal Action Localization
Yang, Jin
Wei, Ping
Ren, Ziyang
Zheng, Nanning
IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 5705 - 5717
[23] Multi-Scale Vision Transformer for Defect Object Detection
Lou, Liangshan
Lu, Ke
Xue, Jian
Procedia Computer Science, 2023, 222 : 397 - 406
[24] Transformer tracking with multi-scale dual-attention
Wang, Jun
Lai, Changwang
Zhang, Wenshuang
Wang, Yuanyun
Meng, Chenchen
COMPLEX & INTELLIGENT SYSTEMS, 2023, 9 (05) : 5793 - 5806
[25] Multi-Scale Transformer Network for Hyperspectral Image Denoising
Hu, Shuai
Hu, Yikun
Lin, Junyan
Gao, Feng
Dong, Junyu
IGARSS 2023 - 2023 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM, 2023, : 7328 - 7331
[26] Rethinking Multi-Scale Representations in Deep Deraining Transformer
Chen, Hongming
Chen, Xiang
Lu, Jiyang
Li, Yufeng
THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 2, 2024, : 1046 - 1053
[27] MSTFDN: Multi-scale transformer fusion dehazing network
Yan Yang
Haowen Zhang
Xudong Wu
Xiaozhen Liang
Applied Intelligence, 2023, 53 : 5951 - 5962
[28] MSTFDN: Multi-scale transformer fusion dehazing network
Yang, Yan
Zhang, Haowen
Wu, Xudong
Liang, Xiaozhen
APPLIED INTELLIGENCE, 2023, 53 (05) : 5951 - 5962
[29] DilateFormer: Multi-Scale Dilated Transformer for Visual Recognition
Jiao, Jiayu
Tang, Yu-Ming
Lin, Kun-Yu
Gao, Yipeng
Ma, Andy J.
Wang, Yaowei
Zheng, Wei-Shi
IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 8906 - 8919
[30] Multi-scale transformer with conditioned prompt for image deraining
Wu, Xianhao
Chen, Hongming
Chen, Xiang
Xu, Guili
DIGITAL SIGNAL PROCESSING, 2025, 156

← 1 2 3 4 5 →