Fine-grained Temporal Contrastive Learning for Weakly-supervised Temporal Action Localization

被引:40
|
作者
Gao, Junyu [1 ,2 ]
Chen, Mengyuan [1 ,2 ]
Xu, Changsheng [1 ,2 ,3 ]
机构
[1] Chinese Acad Sci CASIA, Inst Automat, Natl Lab Pattern Recognit NLPR, Beijing, Peoples R China
[2] Univ Chinese Acad Sci UCAS, Sch Artificial Intelligence, Beijing, Peoples R China
[3] Peng Cheng Lab, Shenzhen, Peoples R China
基金
中国国家自然科学基金; 北京市自然科学基金;
关键词
D O I
10.1109/CVPR52688.2022.01937
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We target at the task of weakly-supervised action localization (WSAL), where only video-level action labels are available during model training. Despite the recent progress, existing methods mainly embrace a localization-by-classification paradigm and overlook the fruitful fine-grained temporal distinctions between video sequences, thus suffering from severe ambiguity in classification learning and classification-to-localization adaption. This paper argues that learning by contextually comparing sequence-to-sequence distinctions offers an essential inductive bias in WSAL and helps identify coherent action instances. Specifically, under a differentiable dynamic programming formulation, two complementary contrastive objectives are designed, including Fine-grained Sequence Distance (FSD) contrasting and Longest Common Subsequence (LCS) contrasting, where the first one considers the relations of various action/background proposals by using match, insert, and delete operators and the second one mines the longest common subsequences between two videos. Both contrasting modules can enhance each other and jointly enjoy the merits of discriminative action-background separation and alleviated task gap between classification and localization. Extensive experiments show that our method achieves state-of-the-art performance on two popular benchmarks.
引用
收藏
页码:19967 / 19977
页数:11
相关论文
共 50 条
  • [1] CoLA: Weakly-Supervised Temporal Action Localization with Snippet Contrastive Learning
    Zhang, Can
    Cao, Meng
    Yang, Dongming
    Chen, Jie
    Zou, Yuexian
    [J]. 2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 16005 - 16014
  • [2] Temporal RPN Learning for Weakly-Supervised Temporal Action Localization
    Huang, Jing
    Kong, Ming
    Chen, Luyuan
    Liang, Tian
    Zhu, Qiang
    [J]. ASIAN CONFERENCE ON MACHINE LEARNING, VOL 222, 2023, 222
  • [3] Weakly-Supervised Temporal Action Detection for Fine-Grained Videos with Hierarchical Atomic Actions
    Li, Zhi
    He, Lu
    Xu, Huijuan
    [J]. COMPUTER VISION, ECCV 2022, PT X, 2022, 13670 : 567 - 584
  • [4] Actionness Inconsistency-Guided Contrastive Learning for Weakly-Supervised Temporal Action Localization
    Li, Zhilin
    Wang, Zilei
    Liu, Qinying
    [J]. THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 2, 2023, : 1513 - 1521
  • [5] Semantic and Temporal Contextual Correlation Learning for Weakly-Supervised Temporal Action Localization
    Fu, Jie
    Gao, Junyu
    Xu, Changsheng
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (10) : 12427 - 12443
  • [6] Weakly Supervised Temporal Action Localization Based on Contrastive Learning
    Hou, Yonghong
    Li, Yueyang
    Guo, Zihui
    [J]. Tianjin Daxue Xuebao (Ziran Kexue yu Gongcheng Jishu Ban)/Journal of Tianjin University Science and Technology, 2023, 56 (01): : 73 - 80
  • [7] Vectorized Evidential Learning for Weakly-Supervised Temporal Action Localization
    Gao, Junyu
    Chen, Mengyuan
    Xu, Changsheng
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (12) : 15949 - 15963
  • [8] Weakly-supervised temporal action localization: a survey
    AbdulRahman Baraka
    Mohd Halim Mohd Noor
    [J]. Neural Computing and Applications, 2022, 34 : 8479 - 8499
  • [9] Weakly-supervised temporal action localization: a survey
    Baraka, AbdulRahman
    Noor, Mohd Halim Mohd
    [J]. NEURAL COMPUTING & APPLICATIONS, 2022, 34 (11): : 8479 - 8499
  • [10] Learning Action Completeness from Points for Weakly-supervised Temporal Action Localization
    Lee, Pilhyeon
    Byun, Hyeran
    [J]. 2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 13628 - 13637