Snippet-level Supervised Contrastive Learning-based Transformer for Temporal Action Detection

被引:0
|
作者
Xu, Ronghai [1 ]
Liu, Changhong [1 ]
Chen, Yong [2 ]
Lei, Zhenchun [1 ]
机构
[1] Jiangxi Normal Univ, Sch Comp & Informat Engn, Nanchang, Jiangxi, Peoples R China
[2] Nanchang Inst Technol, Sch Business Adm, Nanchang, Jiangxi, Peoples R China
基金
中国国家自然科学基金;
关键词
temporal action detection; supervised contrastive learning; transformer; action proposal generation;
D O I
10.1109/IJCNN54540.2023.10191802
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Anchor-free temporal action detection methods have recently achieved many good results in solving the problem of flexible boundaries and different duration of actions. But the anchor-free methods use local features to predict the action boundaries so that it is sensitive to noises and prone to generate incomplete action proposals. Moreover, there exist long-term temporal dependencies between actions and temporal semantic consistency between action primitives in the same classes of actions. Therefore, we propose a snippet-level supervised contrastive learning-based transformer (SSCL-T) model for temporal action detection, which can learn semantically local and global temporal relationships in actions. This model learns the local temporal dynamic features of actions through local temporal coding and uses the transformer to model the global semantic dependencies between long-term actions. In addition, we utilize the action class information to learn the high-level semantic features of actions by designing a snippet-level supervised contrastive learning, forcing the temporal dynamic features of the same class of actions to be as close as possible and the features of different classes of actions to be as far away as possible, thus effectively realizing accurate prediction of action boundaries. Our model has been verified on two benchmark datasets ActivityNet-v1.3 and THUMOS14. The experimental results demonstrate that the proposed model has significantly improved on both datasets. Compared with the benchmark method BMN, the average mAP value has increased by 2.91% and 8.4% on ActivityNet-v1.3 and THUMOS14, respectively.
引用
收藏
页数:8
相关论文
共 50 条
  • [1] CoLA: Weakly-Supervised Temporal Action Localization with Snippet Contrastive Learning
    Zhang, Can
    Cao, Meng
    Yang, Dongming
    Chen, Jie
    Zou, Yuexian
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 16005 - 16014
  • [2] Weakly Supervised Temporal Action Localization Based on Contrastive Learning
    Hou Y.
    Li Y.
    Guo Z.
    Tianjin Daxue Xuebao (Ziran Kexue yu Gongcheng Jishu Ban)/Journal of Tianjin University Science and Technology, 2023, 56 (01): : 73 - 80
  • [3] Prototype contrastive learning for point-supervised temporal action detection
    Li, Ping
    Cao, Jiachen
    Ye, Xingchao
    EXPERT SYSTEMS WITH APPLICATIONS, 2023, 213
  • [4] Snippet-to-Prototype Contrastive Consensus Network for Weakly Supervised Temporal Action Localization
    Shao, Yuxiang
    Zhang, Feifei
    Xu, Changsheng
    IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 6717 - 6729
  • [5] Semi-Supervised Action Recognition with Temporal Contrastive Learning
    Singh, Ankit
    Chakraborty, Omprakash
    Varshney, Ashutosh
    Panda, Rameswar
    Feris, Rogerio
    Saenko, Kate
    Das, Abir
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 10384 - 10394
  • [6] SWIN transformer based contrastive self-supervised learning for animal detection and classification
    Agilandeeswari, L.
    Meena, S. Divya
    MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 82 (07) : 10445 - 10470
  • [7] SWIN transformer based contrastive self-supervised learning for animal detection and classification
    L. Agilandeeswari
    S. Divya Meena
    Multimedia Tools and Applications, 2023, 82 : 10445 - 10470
  • [8] Supervised Contrastive Learning-Based Classification for Hyperspectral Image
    Huang, Lingbo
    Chen, Yushi
    He, Xin
    Ghamisi, Pedram
    REMOTE SENSING, 2022, 14 (21)
  • [9] Temporal-masked skeleton-based action recognition with supervised contrastive learning
    Zhao, Zhifeng
    Chen, Guodong
    Lin, Yuxiang
    SIGNAL IMAGE AND VIDEO PROCESSING, 2023, 17 (05) : 2267 - 2275
  • [10] Temporal-masked skeleton-based action recognition with supervised contrastive learning
    Zhifeng Zhao
    Guodong Chen
    Yuxiang Lin
    Signal, Image and Video Processing, 2023, 17 : 2267 - 2275