Fine-grained Temporal Contrastive Learning for Weakly-supervised Temporal Action Localization

被引:40
|
作者
Gao, Junyu [1 ,2 ]
Chen, Mengyuan [1 ,2 ]
Xu, Changsheng [1 ,2 ,3 ]
机构
[1] Chinese Acad Sci CASIA, Inst Automat, Natl Lab Pattern Recognit NLPR, Beijing, Peoples R China
[2] Univ Chinese Acad Sci UCAS, Sch Artificial Intelligence, Beijing, Peoples R China
[3] Peng Cheng Lab, Shenzhen, Peoples R China
基金
中国国家自然科学基金; 北京市自然科学基金;
关键词
D O I
10.1109/CVPR52688.2022.01937
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We target at the task of weakly-supervised action localization (WSAL), where only video-level action labels are available during model training. Despite the recent progress, existing methods mainly embrace a localization-by-classification paradigm and overlook the fruitful fine-grained temporal distinctions between video sequences, thus suffering from severe ambiguity in classification learning and classification-to-localization adaption. This paper argues that learning by contextually comparing sequence-to-sequence distinctions offers an essential inductive bias in WSAL and helps identify coherent action instances. Specifically, under a differentiable dynamic programming formulation, two complementary contrastive objectives are designed, including Fine-grained Sequence Distance (FSD) contrasting and Longest Common Subsequence (LCS) contrasting, where the first one considers the relations of various action/background proposals by using match, insert, and delete operators and the second one mines the longest common subsequences between two videos. Both contrasting modules can enhance each other and jointly enjoy the merits of discriminative action-background separation and alleviated task gap between classification and localization. Extensive experiments show that our method achieves state-of-the-art performance on two popular benchmarks.
引用
收藏
页码:19967 / 19977
页数:11
相关论文
共 50 条
  • [21] Boosting Weakly-Supervised Temporal Action Localization with Text Information
    Li, Guozhang
    Cheng, De
    Ding, Xinpeng
    Wang, Nannan
    Wang, Xiaoyu
    Gao, Xinbo
    [J]. 2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 10648 - 10657
  • [22] Feature Matching Network for Weakly-Supervised Temporal Action Localization
    Dou, Peng
    Zhou, Wei
    Liao, Zhongke
    Hu, Haifeng
    [J]. PATTERN RECOGNITION AND COMPUTER VISION, PT IV, 2021, 13022 : 459 - 471
  • [23] Dynamic Graph Modeling for Weakly-Supervised Temporal Action Localization
    Shi, Haichao
    Zhang, Xiao-Yu
    Li, Changsheng
    Gong, Lixing
    Li, Yong
    Bao, Yongjun
    [J]. PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 3820 - 3828
  • [24] Deep Motion Prior for Weakly-Supervised Temporal Action Localization
    Cao, Meng
    Zhang, Can
    Chen, Long
    Shou, Mike Zheng
    Zou, Yuexian
    [J]. IEEE Transactions on Image Processing, 2022, 31 : 5203 - 5213
  • [25] Adaptive Mutual Supervision for Weakly-Supervised Temporal Action Localization
    Ju, Chen
    Zhao, Peisen
    Chen, Siheng
    Zhang, Ya
    Zhang, Xiaoyun
    Wang, Yanfeng
    Tian, Qi
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 6688 - 6701
  • [26] Complementary adversarial mechanisms for weakly-supervised temporal action localization
    Wang, Chuanxu
    Wang, Jing
    Liu, Peng
    [J]. PATTERN RECOGNITION, 2023, 139
  • [27] A Hybrid Attention Mechanism for Weakly-Supervised Temporal Action Localization
    Islam, Ashraful
    Long, Chengjiang
    Radke, Richard
    [J]. THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 1637 - 1645
  • [28] Deep Motion Prior for Weakly-Supervised Temporal Action Localization
    Cao, Meng
    Zhang, Can
    Chen, Long
    Shou, Mike Zheng
    Zou, Yuexian
    [J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2022, 31 : 5203 - 5213
  • [29] Weakly-Supervised Temporal Action Localization with Regional Similarity Consistency
    Ren, Haoran
    Ren, Hao
    Lu, Hong
    Jin, Cheng
    [J]. MULTIMEDIA MODELING, MMM 2023, PT I, 2023, 13833 : 69 - 81
  • [30] Complementary Parts Contrastive Learning for Fine-Grained Weakly Supervised Object Co-Localization
    Ma, Lei
    Zhao, Fan
    Hong, Hanyu
    Wang, Lei
    Zhu, Ying
    [J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2023, 33 (11) : 6635 - 6648