Snippet-level Supervised Contrastive Learning-based Transformer for Temporal Action Detection

被引：0

作者：

Xu, Ronghai ^{[1
]}

Liu, Changhong ^{[1
]}

Chen, Yong ^{[2
]}

Lei, Zhenchun ^{[1
]}

机构：

[1] Jiangxi Normal Univ, Sch Comp & Informat Engn, Nanchang, Jiangxi, Peoples R China

[2] Nanchang Inst Technol, Sch Business Adm, Nanchang, Jiangxi, Peoples R China

来源：

2023 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN | 2023年

基金：

中国国家自然科学基金;

关键词：

temporal action detection; supervised contrastive learning; transformer; action proposal generation;

D O I：

10.1109/IJCNN54540.2023.10191802

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Anchor-free temporal action detection methods have recently achieved many good results in solving the problem of flexible boundaries and different duration of actions. But the anchor-free methods use local features to predict the action boundaries so that it is sensitive to noises and prone to generate incomplete action proposals. Moreover, there exist long-term temporal dependencies between actions and temporal semantic consistency between action primitives in the same classes of actions. Therefore, we propose a snippet-level supervised contrastive learning-based transformer (SSCL-T) model for temporal action detection, which can learn semantically local and global temporal relationships in actions. This model learns the local temporal dynamic features of actions through local temporal coding and uses the transformer to model the global semantic dependencies between long-term actions. In addition, we utilize the action class information to learn the high-level semantic features of actions by designing a snippet-level supervised contrastive learning, forcing the temporal dynamic features of the same class of actions to be as close as possible and the features of different classes of actions to be as far away as possible, thus effectively realizing accurate prediction of action boundaries. Our model has been verified on two benchmark datasets ActivityNet-v1.3 and THUMOS14. The experimental results demonstrate that the proposed model has significantly improved on both datasets. Compared with the benchmark method BMN, the average mAP value has increased by 2.91% and 8.4% on ActivityNet-v1.3 and THUMOS14, respectively.

引用

页数：8

共 50 条

[1] CoLA: Weakly-Supervised Temporal Action Localization with Snippet Contrastive Learning
Zhang, Can
Cao, Meng
Yang, Dongming
Chen, Jie
Zou, Yuexian
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 16005 - 16014
[2] Weakly Supervised Temporal Action Localization Based on Contrastive Learning
Hou Y.
Li Y.
Guo Z.
Tianjin Daxue Xuebao (Ziran Kexue yu Gongcheng Jishu Ban)/Journal of Tianjin University Science and Technology, 2023, 56 (01): : 73 - 80
[3] Prototype contrastive learning for point-supervised temporal action detection
Li, Ping
Cao, Jiachen
Ye, Xingchao
EXPERT SYSTEMS WITH APPLICATIONS, 2023, 213
[4] Snippet-to-Prototype Contrastive Consensus Network for Weakly Supervised Temporal Action Localization
Shao, Yuxiang
Zhang, Feifei
Xu, Changsheng
IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 6717 - 6729
[5] Semi-Supervised Action Recognition with Temporal Contrastive Learning
Singh, Ankit
Chakraborty, Omprakash
Varshney, Ashutosh
Panda, Rameswar
Feris, Rogerio
Saenko, Kate
Das, Abir
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 10384 - 10394
[6] SWIN transformer based contrastive self-supervised learning for animal detection and classification
Agilandeeswari, L.
Meena, S. Divya
MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 82 (07) : 10445 - 10470
[7] SWIN transformer based contrastive self-supervised learning for animal detection and classification
L. Agilandeeswari
S. Divya Meena
Multimedia Tools and Applications, 2023, 82 : 10445 - 10470
[8] Supervised Contrastive Learning-Based Classification for Hyperspectral Image
Huang, Lingbo
Chen, Yushi
He, Xin
Ghamisi, Pedram
REMOTE SENSING, 2022, 14 (21)
[9] Temporal-masked skeleton-based action recognition with supervised contrastive learning
Zhao, Zhifeng
Chen, Guodong
Lin, Yuxiang
SIGNAL IMAGE AND VIDEO PROCESSING, 2023, 17 (05) : 2267 - 2275
[10] Temporal-masked skeleton-based action recognition with supervised contrastive learning
Zhifeng Zhao
Guodong Chen
Yuxiang Lin
Signal, Image and Video Processing, 2023, 17 : 2267 - 2275

← 1 2 3 4 5 →