Fine-grained Temporal Contrastive Learning for Weakly-supervised Temporal Action Localization

被引:40
|
作者
Gao, Junyu [1 ,2 ]
Chen, Mengyuan [1 ,2 ]
Xu, Changsheng [1 ,2 ,3 ]
机构
[1] Chinese Acad Sci CASIA, Inst Automat, Natl Lab Pattern Recognit NLPR, Beijing, Peoples R China
[2] Univ Chinese Acad Sci UCAS, Sch Artificial Intelligence, Beijing, Peoples R China
[3] Peng Cheng Lab, Shenzhen, Peoples R China
基金
中国国家自然科学基金; 北京市自然科学基金;
关键词
D O I
10.1109/CVPR52688.2022.01937
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We target at the task of weakly-supervised action localization (WSAL), where only video-level action labels are available during model training. Despite the recent progress, existing methods mainly embrace a localization-by-classification paradigm and overlook the fruitful fine-grained temporal distinctions between video sequences, thus suffering from severe ambiguity in classification learning and classification-to-localization adaption. This paper argues that learning by contextually comparing sequence-to-sequence distinctions offers an essential inductive bias in WSAL and helps identify coherent action instances. Specifically, under a differentiable dynamic programming formulation, two complementary contrastive objectives are designed, including Fine-grained Sequence Distance (FSD) contrasting and Longest Common Subsequence (LCS) contrasting, where the first one considers the relations of various action/background proposals by using match, insert, and delete operators and the second one mines the longest common subsequences between two videos. Both contrasting modules can enhance each other and jointly enjoy the merits of discriminative action-background separation and alleviated task gap between classification and localization. Extensive experiments show that our method achieves state-of-the-art performance on two popular benchmarks.
引用
收藏
页码:19967 / 19977
页数:11
相关论文
共 50 条
  • [31] Weakly-Supervised Temporal Localization via Occurrence Count Learning
    Schroeter, Julien
    Sidorov, Kirill
    Marshall, David
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 97, 2019, 97
  • [32] Background-Aware Robust Context Learning for Weakly-Supervised Temporal Action Localization
    Kim, Jinah
    Cho, Jungchan
    [J]. IEEE ACCESS, 2022, 10 : 65315 - 65325
  • [33] Proposal-based Multiple Instance Learning for Weakly-supervised Temporal Action Localization
    Ren, Huan
    Yang, Wenfei
    Zhang, Tianzhu
    Zhang, Yongdong
    [J]. 2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 2394 - 2404
  • [34] Cascade Evidential Learning for Open-world Weakly-supervised Temporal Action Localization
    Chen, Mengyuan
    Gao, Junyu
    Xu, Changsheng
    [J]. 2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 14741 - 14750
  • [35] Spatial-temporal correlations learning and action-background jointed attention for weakly-supervised temporal action localization
    Xia, Huifen
    Zhan, Yongzhao
    Cheng, Keyang
    [J]. MULTIMEDIA SYSTEMS, 2022, 28 (04) : 1529 - 1541
  • [36] Weakly-Supervised Temporal Action Localization via Cross-Stream Collaborative Learning
    Ji, Yuan
    Jia, Xu
    Lu, Huchuan
    Ruan, Xiang
    [J]. PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 853 - 861
  • [37] Temporal Feature Enhancement Dilated Convolution Network for Weakly-supervised Temporal Action Localization
    Zhou, Jianxiong
    Wu, Ying
    [J]. 2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2023, : 6017 - 6026
  • [38] Deep cascaded action attention network for weakly-supervised temporal action localization
    Hui-fen Xia
    Yong-zhao Zhan
    [J]. Multimedia Tools and Applications, 2023, 82 : 29769 - 29787
  • [39] Deep cascaded action attention network for weakly-supervised temporal action localization
    Xia, Hui-fen
    Zhan, Yong-zhao
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 82 (19) : 29769 - 29787
  • [40] ACGNet: Action Complement Graph Network for Weakly-Supervised Temporal Action Localization
    Yang, Zichen
    Qin, Jie
    Huang, Di
    [J]. THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / THE TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 3090 - 3098