TQRFormer: Tubelet query recollection transformer for action detection

被引:1
|
作者
Wang, Xiangyang [1 ]
Yang, Kun [1 ]
Ding, Qiang [2 ]
Wang, Rui [1 ]
Sun, Jinhua [2 ]
机构
[1] Shanghai Univ, Sch Commun & Informat Engn, Shanghai, Peoples R China
[2] Fudan Univ, Natl Childrens Med Ctr, Dept Psychol Med, Childrens Hosp, Shanghai 201102, Peoples R China
基金
上海市自然科学基金; 中国国家自然科学基金;
关键词
Spatio-temporal action detection; Transformer; Query recollection; Matching strategy; Long-term context;
D O I
10.1016/j.imavis.2024.105059
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Spatial and temporal action detection aims to precisely locate actions while predicting their respective categories. The existing solution, TubeR (Zhao et al., 2022), is designed to directly detect action tubes in videos by recognizing and localizing actions using a unified representation. However, a potential challenge arises during the decoding stage, leading to a gradual decrease in the model's performance in action detection, specifically in terms of the confidence associated with detected actions. In this paper, we propose TQRFormer: Tubelet Query Recollection Transformer, enabling the subsequent decoder to obtain information from the previous stage. Specifically, we designed Query Recollection Attention to correct errors and output the synthesized results, effectively breaking the limitations of sequential decoding. During the training stage, TubeR (Zhao et al., 2022) generates a limited number of positive sample queries through a one-to-one matching strategy, potentially impacting the effectiveness of training with positive samples. To enhance the quantity of positive samples, we propose a stage matching approach that combines both one -to -many matching and one-to-one matching without additional queries. This approach serves to boost the overall number of positive samples for improved training outcomes. We also propose a more elegant classification head that contains the start and end frames of the small tubes information, eliminating the necessity for a separate action switch. The performance of TQRFormer is superior to previous state-of-the-art technologies on public action detection datasets, including AVA, UCF101 -24, JHMDB-21 and MultiSports. The code will available at https://github.com/ykyk000/TQRFormer.
引用
收藏
页数:11
相关论文
共 50 条
  • [21] Learning Dynamic Query Combinations for Transformer-based Object Detection and Segmentation
    Cui, Yiming
    Yang, Linjie
    Yu, Haichao
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 202, 2023, 202
  • [22] QSG Transformer: Transformer with Query-Attentive Semantic Graph for Query-Focused Summarization
    Park, Choongwon
    Ko, Youngjoong
    PROCEEDINGS OF THE 45TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR '22), 2022, : 2589 - 2594
  • [23] A Multi-Modal Transformer network for action detection
    Korban, Matthew
    Youngs, Peter
    Acton, Scott T.
    PATTERN RECOGNITION, 2023, 142
  • [24] LGAFormer: transformer with local and global attention for action detection
    Zhang, Haiping
    Zhou, Fuxing
    Wang, Dongjing
    Zhang, Xinhao
    Yu, Dongjin
    Guan, Liming
    JOURNAL OF SUPERCOMPUTING, 2024, 80 (12): : 17952 - 17979
  • [25] End-to-End Temporal Action Detection With Transformer
    Liu, Xiaolong
    Wang, Qimeng
    Hu, Yao
    Tang, Xu
    Zhang, Shiwei
    Bai, Song
    Bai, Xiang
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2022, 31 : 5427 - 5441
  • [26] Effective Lymph Nodes Detection in CT Scans Using Location Debiased Query Selection and Contrastive Query Representation in Transformer
    Yu, Qinji
    Wang, Yirui
    Yan, Ke
    Li, Haoshen
    Guo, Dazhou
    Zhang, Li
    Shen, Na
    Wang, Qifeng
    Ding, Xiaowei
    Lu, Le
    Ye, Xianghua
    Jin, Dakai
    COMPUTER VISION - ECCV 2024, PT XLII, 2025, 15100 : 180 - 198
  • [27] MALT: Multi-scale Action Learning Transformer for Online Action Detection
    Xie, Liping (lpxie@seu.edu.cn), 1600, Institute of Electrical and Electronics Engineers Inc.
  • [28] Focus: Moral action and historical recollection
    Dammaschke, M
    DEUTSCHE ZEITSCHRIFT FUR PHILOSOPHIE, 1998, 46 (06): : 935 - 936
  • [29] Query Preference Analysis on Cascade Inference Human-Object Interaction Detection Transformer
    Jia, Weizhe
    Ma, Shiwei
    INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2023, 37 (13)
  • [30] Spatiotemporal tubelet feature aggregation and object linking for small object detection in videos
    Daniel Cores
    Víctor M. Brea
    Manuel Mucientes
    Applied Intelligence, 2023, 53 : 1205 - 1217