Bridging Search Region Interaction with Template for RGB-T Tracking

被引：44

作者：

Hui, Tianrui ^{[1
,2
]}

Xun, Zizheng ^{[3
,5
]}

Peng, Fengguang ^{[3
,5
]}

Huang, Junshi ^{[4
]}

Wei, Xiaoming ^{[4
]}

Wei, Xiaolin ^{[4
]}

Dai, Jiao ^{[1
,2
]}

Han, Jizhong ^{[1
,2
]}

Liu, Si ^{[3
,5
]}

机构：

[1] Chinese Acad Sci, Inst Informat Engn, Beijing, Peoples R China

[2] Univ Chinese Acad Sci, Sch Cyber Secur, Beijing, Peoples R China

[3] Beihang Univ, Inst Artificial Intelligence, Beijing, Peoples R China

[4] Meituan, Beijing, Peoples R China

[5] Beihang Univ, Hangzhou Innovat Inst, Hangzhou, Peoples R China

来源：

2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR) | 2023年

基金：

中国国家自然科学基金;

关键词：

D O I：

10.1109/CVPR52729.2023.01310

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

RGB-T tracking aims to leverage the mutual enhancement and complement ability of RGB and TIR modalities for improving the tracking process in various scenarios, where cross-modal interaction is the key component. Some previous methods concatenate the RGB and TIR search region features directly to perform a coarse interaction process with redundant background noises introduced. Many other methods sample candidate boxes from search frames and conduct various fusion approaches on isolated pairs of RGB and TIR boxes, which limits the cross-modal interaction within local regions and brings about inadequate context modeling. To alleviate these limitations, we propose a novel Template-Bridged Search region Interaction (TBSI) module which exploits templates as the medium to bridge the cross-modal interaction between RGB and TIR search regions by gathering and distributing target-relevant object and environment contexts. Original templates are also updated with enriched multimodal contexts from the template medium. Our TBSI module is inserted into a ViT backbone for joint feature extraction, search-template matching, and cross-modal interaction. Extensive experiments on three popular RGB-T tracking benchmarks demonstrate our method achieves new state-of-the-art performances. Code is available at https://github.com/RyanHTR/TBSI.

引用

页码：13630 / 13639

页数：10

共 50 条

[31] MFGNet: Dynamic Modality-Aware Filter Generation for RGB-T Tracking
Wang, Xiao
Shu, Xiujun
Zhang, Shiliang
Jiang, Bo
Wang, Yaowei
Tian, Yonghong
Wu, Feng
IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 4335 - 4348
[32] Anchor free based Siamese network tracker with transformer for RGB-T tracking
Liangsong Fan
Pyeoungkee Kim
Scientific Reports, 13
[33] FADSiamNet: feature affinity drift siamese network for RGB-T target tracking
Li, Haiyan
Cao, Yonghui
Guo, Lei
Chen, Quan
Ding, Zhaisheng
Xie, Shidong
INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2024, : 2779 - 2799
[34] Unified Single- Stage Transformer Network for Efficient RGB-T Tracking
Xia, Jianqiang
Shi, Dianxi
Song, Ke
Song, Linna
Wang, Xiaolei
Jin, Songchang
Zhao, Chenran
Cheng, Yu
Jin, Lei
Zhu, Zheng
Li, Jianan
Wang, Gang
Xing, Junliang
Zhao, Jian
PROCEEDINGS OF THE THIRTY-THIRD INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2024, 2024, : 1471 - 1479
[35] Learning Multi-domain Convolutional Network for RGB-T Visual Tracking
Zhang, Xingming
Zhang, Xuehan
Du, Xuedan
Zhou, Xiangming
Yin, Jun
2018 11TH INTERNATIONAL CONGRESS ON IMAGE AND SIGNAL PROCESSING, BIOMEDICAL ENGINEERING AND INFORMATICS (CISP-BMEI 2018), 2018,
[36] Learning Soft-Consistent Correlation Filters for RGB-T Object Tracking
Wang, Yulong
Li, Chenglong
Tang, Jin
PATTERN RECOGNITION AND COMPUTER VISION (PRCV 2018), PT IV, 2018, 11259 : 295 - 306
[37] Heterogeneous Graph Transformer for Multiple Tiny Object Tracking in RGB-T Videos
Xu, Qingyu
Wang, Longguang
Sheng, Weidong
Wang, Yingqian
Xiao, Chao
Ma, Chao
An, Wei
IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 9383 - 9397
[38] Fast RGB-T Tracking via Cross-Modal Correlation Filters
Zhai, Sulan
Shao, Pengpeng
Liang, Xinyan
Wang, Xin
NEUROCOMPUTING, 2019, 334 : 172 - 181
[39] Multi-Modal Fusion for End-to-End RGB-T Tracking
Zhang, Lichao
Danelljan, Martin
Gonzalez-Garcia, Abel
van de Weijer, Joost
Khan, Fahad Shahbaz
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW), 2019, : 2252 - 2261
[40] Learning Modality Complementary Features with Mixed Attention Mechanism for RGB-T Tracking
Luo, Yang
Guo, Xiqing
Dong, Mingtao
Yu, Jin
SENSORS, 2023, 23 (14)

← 1 2 3 4 5 →