Temporal Deformable Transformer for Action Localization

被引:0
|
作者
Wang, Haoying [1 ]
Wei, Ping [1 ]
Liu, Meiqin [1 ]
Zheng, Nanning [1 ]
机构
[1] Xi An Jiao Tong Univ, Inst Artificial Intelligence & Robot, Xian, Peoples R China
基金
中国国家自然科学基金;
关键词
Temporal Action Localization; Transformer; Deformable Attention; Video Understanding;
D O I
10.1007/978-3-031-44223-0_45
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Temporal action localization (TAL) is a challenging task that has received significant attention in video understanding. Recently, Transformer-based models have demonstrated their effectiveness in capturing contextual information and achieved outstanding performance on various TAL benchmarks. However, these methods still face challenges in computational efficiency and contextual modeling rigidity. In this paper, we propose a method to address those problems in Transformer-based models. Our model introduces a temporal deformable Transformer module and the corresponding time normalization, enabling flexible aggregation of temporal context information in videos, leading to enhanced video representations. To demonstrate the effectiveness of the proposed method, we construct a Transformer-based anchor-free model with a simple prediction head, which yields superior performance on widely used benchmarks. Specifically, it achieves an average mAP of 67.4% on THUMOS14 and an average mAP of 36.8% on ActivityNet-v1.3.
引用
收藏
页码:563 / 575
页数:13
相关论文
共 50 条
  • [1] An Adaptive Dual Selective Transformer for Temporal Action Localization
    Li, Qiang
    Zu, Guang
    Xu, Hui
    Kong, Jun
    Zhang, Yanni
    Wang, Jianzhong
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 7398 - 7412
  • [2] A Multitemporal Scale and Spatial-Temporal Transformer Network for Temporal Action Localization
    Gao, Zan
    Cui, Xinglei
    Zhuo, Tao
    Cheng, Zhiyong
    Liu, An-An
    Wang, Meng
    Chen, Shenyong
    [J]. IEEE TRANSACTIONS ON HUMAN-MACHINE SYSTEMS, 2023, 53 (03) : 569 - 580
  • [3] Cross Time-Frequency Transformer for Temporal Action Localization
    Yang, Jin
    Wei, Ping
    Zheng, Nanning
    [J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (06) : 4625 - 4638
  • [4] Gated Multi-Scale Transformer for Temporal Action Localization
    Yang, Jin
    Wei, Ping
    Ren, Ziyang
    Zheng, Nanning
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 5705 - 5717
  • [5] TALLFormer: Temporal Action Localization with a Long-Memory Transformer
    Cheng, Feng
    Bertasius, Gedas
    [J]. COMPUTER VISION, ECCV 2022, PT XXXIV, 2022, 13694 : 503 - 521
  • [6] W-ART: ACTION RELATION TRANSFORMER FOR WEAKLY-SUPERVISED TEMPORAL ACTION LOCALIZATION
    Li, Mengzhu
    Wu, Hongjun
    Liu, Yongcheng
    Liu, Hongzhe
    Xu, Cheng
    Li, Xuewei
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 2195 - 2199
  • [7] Actionness-Guided Transformer for Anchor-Free Temporal Action Localization
    Zhao, Peisen
    Xie, Lingxi
    Zhang, Ya
    Tian, Qi
    [J]. IEEE SIGNAL PROCESSING LETTERS, 2022, 29 : 194 - 198
  • [8] POTLoc: Pseudo-label Oriented Transformer for point-supervised temporal Action Localization
    Vahdani, Elahe
    Tian, Yingli
    [J]. COMPUTER VISION AND IMAGE UNDERSTANDING, 2024, 246
  • [9] A Survey on Temporal Action Localization
    Xia, Huifen
    Zhan, Yongzhao
    [J]. IEEE ACCESS, 2020, 8 : 70477 - 70487
  • [10] Exploring Action Centers for Temporal Action Localization
    Xia, Kun
    Wang, Le
    Shen, Yichao
    Zhou, Sanpin
    Hua, Gang
    Tang, Wei
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 9425 - 9436