Bridging Search Region Interaction with Template for RGB-T Tracking

被引:44
|
作者
Hui, Tianrui [1 ,2 ]
Xun, Zizheng [3 ,5 ]
Peng, Fengguang [3 ,5 ]
Huang, Junshi [4 ]
Wei, Xiaoming [4 ]
Wei, Xiaolin [4 ]
Dai, Jiao [1 ,2 ]
Han, Jizhong [1 ,2 ]
Liu, Si [3 ,5 ]
机构
[1] Chinese Acad Sci, Inst Informat Engn, Beijing, Peoples R China
[2] Univ Chinese Acad Sci, Sch Cyber Secur, Beijing, Peoples R China
[3] Beihang Univ, Inst Artificial Intelligence, Beijing, Peoples R China
[4] Meituan, Beijing, Peoples R China
[5] Beihang Univ, Hangzhou Innovat Inst, Hangzhou, Peoples R China
来源
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR) | 2023年
基金
中国国家自然科学基金;
关键词
D O I
10.1109/CVPR52729.2023.01310
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
RGB-T tracking aims to leverage the mutual enhancement and complement ability of RGB and TIR modalities for improving the tracking process in various scenarios, where cross-modal interaction is the key component. Some previous methods concatenate the RGB and TIR search region features directly to perform a coarse interaction process with redundant background noises introduced. Many other methods sample candidate boxes from search frames and conduct various fusion approaches on isolated pairs of RGB and TIR boxes, which limits the cross-modal interaction within local regions and brings about inadequate context modeling. To alleviate these limitations, we propose a novel Template-Bridged Search region Interaction (TBSI) module which exploits templates as the medium to bridge the cross-modal interaction between RGB and TIR search regions by gathering and distributing target-relevant object and environment contexts. Original templates are also updated with enriched multimodal contexts from the template medium. Our TBSI module is inserted into a ViT backbone for joint feature extraction, search-template matching, and cross-modal interaction. Extensive experiments on three popular RGB-T tracking benchmarks demonstrate our method achieves new state-of-the-art performances. Code is available at https://github.com/RyanHTR/TBSI.
引用
收藏
页码:13630 / 13639
页数:10
相关论文
共 50 条
  • [31] MFGNet: Dynamic Modality-Aware Filter Generation for RGB-T Tracking
    Wang, Xiao
    Shu, Xiujun
    Zhang, Shiliang
    Jiang, Bo
    Wang, Yaowei
    Tian, Yonghong
    Wu, Feng
    IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 4335 - 4348
  • [32] Anchor free based Siamese network tracker with transformer for RGB-T tracking
    Liangsong Fan
    Pyeoungkee Kim
    Scientific Reports, 13
  • [33] FADSiamNet: feature affinity drift siamese network for RGB-T target tracking
    Li, Haiyan
    Cao, Yonghui
    Guo, Lei
    Chen, Quan
    Ding, Zhaisheng
    Xie, Shidong
    INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2024, : 2779 - 2799
  • [34] Unified Single- Stage Transformer Network for Efficient RGB-T Tracking
    Xia, Jianqiang
    Shi, Dianxi
    Song, Ke
    Song, Linna
    Wang, Xiaolei
    Jin, Songchang
    Zhao, Chenran
    Cheng, Yu
    Jin, Lei
    Zhu, Zheng
    Li, Jianan
    Wang, Gang
    Xing, Junliang
    Zhao, Jian
    PROCEEDINGS OF THE THIRTY-THIRD INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2024, 2024, : 1471 - 1479
  • [35] Learning Multi-domain Convolutional Network for RGB-T Visual Tracking
    Zhang, Xingming
    Zhang, Xuehan
    Du, Xuedan
    Zhou, Xiangming
    Yin, Jun
    2018 11TH INTERNATIONAL CONGRESS ON IMAGE AND SIGNAL PROCESSING, BIOMEDICAL ENGINEERING AND INFORMATICS (CISP-BMEI 2018), 2018,
  • [36] Learning Soft-Consistent Correlation Filters for RGB-T Object Tracking
    Wang, Yulong
    Li, Chenglong
    Tang, Jin
    PATTERN RECOGNITION AND COMPUTER VISION (PRCV 2018), PT IV, 2018, 11259 : 295 - 306
  • [37] Heterogeneous Graph Transformer for Multiple Tiny Object Tracking in RGB-T Videos
    Xu, Qingyu
    Wang, Longguang
    Sheng, Weidong
    Wang, Yingqian
    Xiao, Chao
    Ma, Chao
    An, Wei
    IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 9383 - 9397
  • [38] Fast RGB-T Tracking via Cross-Modal Correlation Filters
    Zhai, Sulan
    Shao, Pengpeng
    Liang, Xinyan
    Wang, Xin
    NEUROCOMPUTING, 2019, 334 : 172 - 181
  • [39] Multi-Modal Fusion for End-to-End RGB-T Tracking
    Zhang, Lichao
    Danelljan, Martin
    Gonzalez-Garcia, Abel
    van de Weijer, Joost
    Khan, Fahad Shahbaz
    2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW), 2019, : 2252 - 2261
  • [40] Learning Modality Complementary Features with Mixed Attention Mechanism for RGB-T Tracking
    Luo, Yang
    Guo, Xiqing
    Dong, Mingtao
    Yu, Jin
    SENSORS, 2023, 23 (14)