Snippet-level Supervised Contrastive Learning-based Transformer for Temporal Action Detection

被引:0
|
作者
Xu, Ronghai [1 ]
Liu, Changhong [1 ]
Chen, Yong [2 ]
Lei, Zhenchun [1 ]
机构
[1] Jiangxi Normal Univ, Sch Comp & Informat Engn, Nanchang, Jiangxi, Peoples R China
[2] Nanchang Inst Technol, Sch Business Adm, Nanchang, Jiangxi, Peoples R China
基金
中国国家自然科学基金;
关键词
temporal action detection; supervised contrastive learning; transformer; action proposal generation;
D O I
10.1109/IJCNN54540.2023.10191802
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Anchor-free temporal action detection methods have recently achieved many good results in solving the problem of flexible boundaries and different duration of actions. But the anchor-free methods use local features to predict the action boundaries so that it is sensitive to noises and prone to generate incomplete action proposals. Moreover, there exist long-term temporal dependencies between actions and temporal semantic consistency between action primitives in the same classes of actions. Therefore, we propose a snippet-level supervised contrastive learning-based transformer (SSCL-T) model for temporal action detection, which can learn semantically local and global temporal relationships in actions. This model learns the local temporal dynamic features of actions through local temporal coding and uses the transformer to model the global semantic dependencies between long-term actions. In addition, we utilize the action class information to learn the high-level semantic features of actions by designing a snippet-level supervised contrastive learning, forcing the temporal dynamic features of the same class of actions to be as close as possible and the features of different classes of actions to be as far away as possible, thus effectively realizing accurate prediction of action boundaries. Our model has been verified on two benchmark datasets ActivityNet-v1.3 and THUMOS14. The experimental results demonstrate that the proposed model has significantly improved on both datasets. Compared with the benchmark method BMN, the average mAP value has increased by 2.91% and 8.4% on ActivityNet-v1.3 and THUMOS14, respectively.
引用
收藏
页数:8
相关论文
共 50 条
  • [21] Instance-Level Contrastive Learning for Weakly Supervised Object Detection
    Zhang, Ming
    Zeng, Bing
    SENSORS, 2022, 22 (19)
  • [22] Supervised Contrastive Learning-Based Modulation Classification of Underwater Acoustic Communication
    Gao, Daqing
    Hua, Wenhui
    Su, Wei
    Xu, Zehong
    Chen, Keyu
    WIRELESS COMMUNICATIONS & MOBILE COMPUTING, 2022, 2022
  • [23] A survey on deep learning-based spatio-temporal action detection
    Wang, Peng
    Zeng, Fanwei
    Qian, Yuntao
    INTERNATIONAL JOURNAL OF WAVELETS MULTIRESOLUTION AND INFORMATION PROCESSING, 2024, 22 (04)
  • [24] Supervised Contrastive Learning-Based Modulation Classification of Underwater Acoustic Communication
    Gao, Daqing
    Hua, Wenhui
    Su, Wei
    Xu, Zehong
    Chen, Keyu
    WIRELESS COMMUNICATIONS & MOBILE COMPUTING, 2022, 2022
  • [25] Weakly Supervised Learning-based Table Detection
    Gurav A.A.
    Nene M.J.
    SN Computer Science, 2020, 1 (2)
  • [26] Supervised Machine Learning-based Fall Detection
    Caya, Meo Vincent C.
    Magwili, Glenn V.
    Agulto, Denver L.
    John Laranang, Russell
    Palomo, Louisse Kayle G.
    2018 IEEE 10TH INTERNATIONAL CONFERENCE ON HUMANOID, NANOTECHNOLOGY, INFORMATION TECHNOLOGY, COMMUNICATION AND CONTROL, ENVIRONMENT AND MANAGEMENT (HNICEM), 2018,
  • [27] Weakly Supervised Temporal Action Localization via Representative Snippet Knowledge Propagation
    Huang, Linjiang
    Wang, Liang
    Li, Hongsheng
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 3262 - 3271
  • [28] Unsupervised contrastive learning based transformer for lung nodule detection
    Niu, Chuang
    Wang, Ge
    PHYSICS IN MEDICINE AND BIOLOGY, 2022, 67 (20):
  • [29] Learning frame-level affinity with video-level labels for weakly supervised temporal action detection
    Li, Bairong
    Zhu, Yuesheng
    Liu, Ruixin
    Weng, Zhenyu
    NEUROCOMPUTING, 2021, 463 : 109 - 121
  • [30] SUPERVISED CONTRASTIVE LEARNING-BASED DEEP HASH RETRIEVAL FOR REMOTE SENSING IMAGE
    Huang, Mengluan
    Dong, Le
    Dong, Weisheng
    Shi, Guangming
    2022 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM (IGARSS 2022), 2022, : 1512 - 1515