TQRFormer: Tubelet query recollection transformer for action detection

被引:1
|
作者
Wang, Xiangyang [1 ]
Yang, Kun [1 ]
Ding, Qiang [2 ]
Wang, Rui [1 ]
Sun, Jinhua [2 ]
机构
[1] Shanghai Univ, Sch Commun & Informat Engn, Shanghai, Peoples R China
[2] Fudan Univ, Natl Childrens Med Ctr, Dept Psychol Med, Childrens Hosp, Shanghai 201102, Peoples R China
基金
上海市自然科学基金; 中国国家自然科学基金;
关键词
Spatio-temporal action detection; Transformer; Query recollection; Matching strategy; Long-term context;
D O I
10.1016/j.imavis.2024.105059
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Spatial and temporal action detection aims to precisely locate actions while predicting their respective categories. The existing solution, TubeR (Zhao et al., 2022), is designed to directly detect action tubes in videos by recognizing and localizing actions using a unified representation. However, a potential challenge arises during the decoding stage, leading to a gradual decrease in the model's performance in action detection, specifically in terms of the confidence associated with detected actions. In this paper, we propose TQRFormer: Tubelet Query Recollection Transformer, enabling the subsequent decoder to obtain information from the previous stage. Specifically, we designed Query Recollection Attention to correct errors and output the synthesized results, effectively breaking the limitations of sequential decoding. During the training stage, TubeR (Zhao et al., 2022) generates a limited number of positive sample queries through a one-to-one matching strategy, potentially impacting the effectiveness of training with positive samples. To enhance the quantity of positive samples, we propose a stage matching approach that combines both one -to -many matching and one-to-one matching without additional queries. This approach serves to boost the overall number of positive samples for improved training outcomes. We also propose a more elegant classification head that contains the start and end frames of the small tubes information, eliminating the necessity for a separate action switch. The performance of TQRFormer is superior to previous state-of-the-art technologies on public action detection datasets, including AVA, UCF101 -24, JHMDB-21 and MultiSports. The code will available at https://github.com/ykyk000/TQRFormer.
引用
收藏
页数:11
相关论文
共 50 条
  • [1] TubeR: Tubelet Transformer for Video Action Detection
    Zhao, Jiaojiao
    Zhang, Yanyi
    Li, Xinyu
    Chen, Hao
    Shuai, Bing
    Xu, Mingze
    Liu, Chunhui
    Kundu, Kaustav
    Xiong, Yuanjun
    Modolo, Davide
    Marsic, Ivan
    Snoek, Cees G. M.
    Tighe, Joseph
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 13588 - 13597
  • [2] Discriminative action tubelet detector for weakly-supervised action detection
    Lee, Jiyoung
    Kim, Seungryong
    Kim, Sunok
    Sohn, Kwanghoon
    PATTERN RECOGNITION, 2024, 155
  • [3] Recurrent Tubelet Proposal and Recognition Networks for Action Detection
    Li, Dong
    Qiu, Zhaofan
    Dai, Qi
    Yao, Ting
    Mei, Tao
    COMPUTER VISION - ECCV 2018, PT VI, 2018, 11210 : 306 - 322
  • [4] Online Action Detection by Long Short-term Transformer with Query Exemplars-transformer
    Zhang, Honglei
    Guo, Yijing
    Dui, Xiaofu
    2024 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN 2024, 2024,
  • [5] Three-Stream Action Tubelet Detector for Spatiotemporal Action Detection in Videos
    Wu, Yutang
    Wang, Hanli
    Li, Qinyu
    ADVANCES IN MULTIMEDIA INFORMATION PROCESSING - PCM 2018, PT II, 2018, 11165 : 296 - 306
  • [6] ENHANCED ACTION TUBELET DETECTOR FOR SPATIO-TEMPORAL VIDEO ACTION DETECTION
    Wu, Yutang
    Wang, Hanli
    Wang, Shuheng
    Li, Qinyu
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 2388 - 2392
  • [7] Enhanced Training of Query-Based Object Detection via Selective Query Recollection
    Chen, Fangyi
    Zhang, Han
    Hu, Kai
    Huang, Yu-Kai
    Zhu, Chenchen
    Savvides, Marios
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 23756 - 23765
  • [8] Generic Tubelet Proposals for Action Localization
    He, Jiawei
    Deng, Zhiwei
    Ibrahim, Mostafa S.
    Mori, Greg
    2018 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV 2018), 2018, : 343 - 351
  • [9] Action Tubelet Detector for Spatio-Temporal Action Localization
    Kalogeiton, Vicky
    Weinzaepfel, Philippe
    Ferrari, Vittorio
    Schmid, Cordelia
    2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, : 4415 - 4423
  • [10] Object Detection in Videos with Tubelet Proposal Networks
    Kang, Kai
    Li, Hongsheng
    Xiao, Tong
    Ouyang, Wanli
    Yan, Junjie
    Liu, Xihui
    Wang, Xiaogang
    30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 889 - 897