Learning frame-level affinity with video-level labels for weakly supervised temporal action detection

被引:2
|
作者
Li, Bairong [1 ]
Zhu, Yuesheng [1 ]
Liu, Ruixin [1 ]
Weng, Zhenyu [1 ]
机构
[1] Peking Univ, Shenzhen Grad Sch, Shenzhen, Peoples R China
关键词
Video understanding; Temporal action detection; Weakly supervised learning; SEMANTIC SEGMENTATION; ACTION RECOGNITION; NETWORK; LOCALIZATION;
D O I
10.1016/j.neucom.2021.07.059
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Weakly supervised temporal action detection aims at localizing actions with only video-level labels rather than lots of frame-level labels. To this end, previous methods train a classification network for mining discernible action frames as detection results. However, the classification network is known to only concentrate on local discernible frames rather than the entire action instance. Therefore, substantial numbers of indiscernible action frames are not detected and the detection results are incomplete. To alle-viate this issue, we propose a novel method to facilitate the detection of indiscernible frames based on learning frame-level affinities. In the proposed method, we design a network (named Affinity Network) for predicting affinities between pairs of adjacent frames. Then, the affinities are used as tran-sition probabilities to propagate local responses to indiscernible frames. As a result, the responses of indiscernible frames can be enhanced and the detection of them can be facilitated. For learning the net-work, we propose strategies to synthesize frame-pair and video-pair training samples, which are con-ducive to learn frame-level affinities with only video-level labels. The experimental results on THUMOS14 dataset and ActivityNet1.2 dataset show that the detection performance of our framework outperforms most previous weakly supervised action detection methods, and is even as competitive as some fully supervised action detection methods. (c) 2021 Elsevier B.V. All rights reserved.
引用
收藏
页码:109 / 121
页数:13
相关论文
共 50 条
  • [1] Two Stage Emotion Recognition using Frame-level and Video-level Features
    Viegas, Carla
    2020 15TH IEEE INTERNATIONAL CONFERENCE ON AUTOMATIC FACE AND GESTURE RECOGNITION (FG 2020), 2020, : 912 - 915
  • [2] Weakly Supervised Temporal Action Localization with Segment-Level Labels
    Ding, Xinpeng
    Wang, Nannan
    Li, Jie
    Gao, Xinbo
    PATTERN RECOGNITION AND COMPUTER VISION, PT I, 2021, 13019 : 42 - 54
  • [3] Rethinking Self-supervised Correspondence Learning: A Video Frame-level Similarity Perspective
    Xu, Jiarui
    Wang, Xiaolong
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 10055 - 10065
  • [4] Frame-Level Stutter Detection
    Harvill, John
    Hasegawa-Johnson, Mark
    Yoo, Changdong
    INTERSPEECH 2022, 2022, : 2843 - 2847
  • [5] Static Video Summarization Using Video Coding Features with Frame-Level Temporal Subsampling and Deep Learning
    Issa, Obada
    Shanableh, Tamer
    APPLIED SCIENCES-BASEL, 2023, 13 (10):
  • [6] Frame-level temporal calibration of video sequences from unsynchronized cameras
    Senem Velipasalar
    Wayne H. Wolf
    Machine Vision and Applications, 2008, 19 : 395 - 409
  • [7] Multi-Speaker Video Dialog with Frame-Level Temporal Localization
    Wang, Qiang
    Jiang, Pin
    Guo, Zhiyi
    Han, Yahong
    Zhao, Zhou
    THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 12200 - 12207
  • [8] Frame-level temporal calibration of video sequences from unsynchronized cameras
    Velipasalar, Senem
    Wolf, Wayne H.
    MACHINE VISION AND APPLICATIONS, 2008, 19 (5-6) : 395 - 409
  • [9] A Self-Reasoning Framework for Anomaly Detection Using Video-Level Labels
    Zaheer, Muhammad Zaigham
    Mahmood, Arif
    Shin, Hochul
    Lee, Seung-Ik
    IEEE SIGNAL PROCESSING LETTERS, 2020, 27 : 1705 - 1709
  • [10] Weakly Supervised Temporal Action Detection With Temporal Dependency Learning
    Li, Bairong
    Liu, Ruixin
    Chen, Tianquan
    Zhu, Yuesheng
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (07) : 4473 - 4485