Boosting Weakly-Supervised Temporal Action Localization with Text Information

被引:5
|
作者
Li, Guozhang [1 ]
Cheng, De [1 ]
Ding, Xinpeng [2 ]
Wang, Nannan [1 ]
Wang, Xiaoyu [3 ]
Gao, Xinbo [1 ,4 ]
机构
[1] Xidian Univ, Xian, Peoples R China
[2] Hong Kong Univ Sci & Technol, Hong Kong, Peoples R China
[3] Chinese Univ Hong Kong Shenzhen, Shenzhen, Peoples R China
[4] Chongqing Univ Posts & Telecommun, Chongqing, Peoples R China
基金
中国国家自然科学基金;
关键词
D O I
10.1109/CVPR52729.2023.01026
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Due to the lack of temporal annotation, current Weakly-supervised Temporal Action Localization (WTAL) methods are generally stuck into over-complete or incomplete localization. In this paper, we aim to leverage the text information to boost WTAL from two aspects, i.e., (a) the discriminative objective to enlarge the inter-class difference, thus reducing the over-complete; (b) the generative objective to enhance the intra-class integrity, thus finding more complete temporal boundaries. For the discriminative objective, we propose a Text-Segment Mining (TSM) mechanism, which constructs a text description based on the action class label, and regards the text as the query to mine all class-related segments. Without the temporal annotation of actions, TSM compares the text query with the entire videos across the dataset to mine the best matching segments while ignoring irrelevant ones. Due to the shared sub-actions in different categories of videos, merely applying TSM is too strict to neglect the semantic-related segments, which results in incomplete localization. We further introduce a generative objective named Video-text Language Completion (VLC), which focuses on all semantic-related segments from videos to complete the text sentence. We achieve the state-of-the-art performance on THUMOS14 and ActivityNet1.3. Surprisingly, we also find our proposed method can be seamlessly applied to existing methods, and improve their performances with a clear margin. The code is available at https://github.com/lgzlIlIlI/Boosting-WTAL.
引用
收藏
页码:10648 / 10657
页数:10
相关论文
共 50 条
  • [1] Weakly-supervised temporal action localization: a survey
    AbdulRahman Baraka
    Mohd Halim Mohd Noor
    [J]. Neural Computing and Applications, 2022, 34 : 8479 - 8499
  • [2] Weakly-supervised temporal action localization: a survey
    Baraka, AbdulRahman
    Noor, Mohd Halim Mohd
    [J]. NEURAL COMPUTING & APPLICATIONS, 2022, 34 (11): : 8479 - 8499
  • [3] Temporal RPN Learning for Weakly-Supervised Temporal Action Localization
    Huang, Jing
    Kong, Ming
    Chen, Luyuan
    Liang, Tian
    Zhu, Qiang
    [J]. ASIAN CONFERENCE ON MACHINE LEARNING, VOL 222, 2023, 222
  • [4] ACTION RELATIONAL GRAPH FOR WEAKLY-SUPERVISED TEMPORAL ACTION LOCALIZATION
    Cheng, Yi
    Sun, Ying
    Lin, Dongyun
    Lim, Joo-Hwee
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2021, : 2563 - 2567
  • [5] Action Coherence Network for Weakly-Supervised Temporal Action Localization
    Zhai, Yuanhao
    Wang, Le
    Tang, Wei
    Zhang, Qilin
    Zheng, Nanning
    Hua, Gang
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2022, 24 : 1857 - 1870
  • [6] Weakly-Supervised Temporal Action Localization by Background Suppression
    Liu, Mengxue
    Gao, Xiangjun
    Ge, Fangzhen
    Liu, Huaiyu
    Li, Wenjing
    [J]. 2021 PROCEEDINGS OF THE 40TH CHINESE CONTROL CONFERENCE (CCC), 2021, : 7074 - 7081
  • [7] Weakly-supervised Temporal Action Localization by Uncertainty Modeling
    Lee, Pilhyeon
    Wang, Jinglu
    Lu, Yan
    Byun, Hyeran
    [J]. THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 1854 - 1862
  • [8] AutoLoc: Weakly-Supervised Temporal Action Localization in Untrimmed Videos
    Shou, Zheng
    Gao, Hang
    Zhang, Lei
    Miyazawa, Kazuyuki
    Chang, Shih-Fu
    [J]. COMPUTER VISION - ECCV 2018, PT XVI, 2018, 11220 : 162 - 179
  • [9] Background Suppression Network for Weakly-Supervised Temporal Action Localization
    Lee, Pilhyeon
    Uh, Youngjung
    Byun, Hyeran
    [J]. THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 11320 - 11327
  • [10] Feature Matching Network for Weakly-Supervised Temporal Action Localization
    Dou, Peng
    Zhou, Wei
    Liao, Zhongke
    Hu, Haifeng
    [J]. PATTERN RECOGNITION AND COMPUTER VISION, PT IV, 2021, 13022 : 459 - 471