Learning frame-level affinity with video-level labels for weakly supervised temporal action detection

被引:2
|
作者
Li, Bairong [1 ]
Zhu, Yuesheng [1 ]
Liu, Ruixin [1 ]
Weng, Zhenyu [1 ]
机构
[1] Peking Univ, Shenzhen Grad Sch, Shenzhen, Peoples R China
关键词
Video understanding; Temporal action detection; Weakly supervised learning; SEMANTIC SEGMENTATION; ACTION RECOGNITION; NETWORK; LOCALIZATION;
D O I
10.1016/j.neucom.2021.07.059
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Weakly supervised temporal action detection aims at localizing actions with only video-level labels rather than lots of frame-level labels. To this end, previous methods train a classification network for mining discernible action frames as detection results. However, the classification network is known to only concentrate on local discernible frames rather than the entire action instance. Therefore, substantial numbers of indiscernible action frames are not detected and the detection results are incomplete. To alle-viate this issue, we propose a novel method to facilitate the detection of indiscernible frames based on learning frame-level affinities. In the proposed method, we design a network (named Affinity Network) for predicting affinities between pairs of adjacent frames. Then, the affinities are used as tran-sition probabilities to propagate local responses to indiscernible frames. As a result, the responses of indiscernible frames can be enhanced and the detection of them can be facilitated. For learning the net-work, we propose strategies to synthesize frame-pair and video-pair training samples, which are con-ducive to learn frame-level affinities with only video-level labels. The experimental results on THUMOS14 dataset and ActivityNet1.2 dataset show that the detection performance of our framework outperforms most previous weakly supervised action detection methods, and is even as competitive as some fully supervised action detection methods. (c) 2021 Elsevier B.V. All rights reserved.
引用
收藏
页码:109 / 121
页数:13
相关论文
共 50 条
  • [31] Frame-level global context modeling for detection and localization of abnormality
    Sharma, Manoj Kumar
    Kumar, Vikas
    Sheet, Debdoot
    Biswas, Prabir Kumar
    MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 82 (25) : 38345 - 38370
  • [32] Prediction-based Loss Recovery for Frame-level Streaming Video
    Kuo, Chun-, I
    Shih, Chi-Huang
    Shieh, Ce-Kuen
    Hwang, Wen-Shyang
    2011 IEEE GLOBECOM WORKSHOPS (GC WKSHPS), 2011, : 72 - 76
  • [33] Frame-level data reuse for motion-compensated temporal filtering
    Chen, Ching-Yeh
    Chen, Yi-Hau
    Cheng, Chih-Chi
    Chen, Liang-Gee
    2006 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, VOLS 1-11, PROCEEDINGS, 2006, : 5571 - +
  • [34] Snippet-level Supervised Contrastive Learning-based Transformer for Temporal Action Detection
    Xu, Ronghai
    Liu, Changhong
    Chen, Yong
    Lei, Zhenchun
    2023 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN, 2023,
  • [35] Frame-level global context modeling for detection and localization of abnormality
    Manoj Kumar Sharma
    Vikas Kumar
    Debdoot Sheet
    Prabir Kumar Biswas
    Multimedia Tools and Applications, 2023, 82 : 38345 - 38370
  • [36] DSANet: Dynamic Segment Aggregation Network for Video-Level Representation Learning
    Wu, Wenhao
    Zhao, Yuxiang
    Xu, Yanwu
    Tan, Xiao
    He, Dongliang
    Zou, Zhikang
    Ye, Jin
    Li, Yingying
    Yao, Mingde
    Dong, Zichao
    Shi, Yifeng
    PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 1903 - 1911
  • [37] Temporal Structure Mining for Weakly Supervised Action Detection
    Yu, Tan
    Ren, Zhou
    Li, Yuncheng
    Yan, Enxu
    Xu, Ning
    Yuan, Junsong
    2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 5521 - 5530
  • [38] Weakly-supervised video anomaly detection via temporal resolution feature learning
    Shengjun Peng
    Yiheng Cai
    Zijun Yao
    Meiling Tan
    Applied Intelligence, 2023, 53 : 30607 - 30625
  • [39] Weakly-supervised video anomaly detection via temporal resolution feature learning
    Peng, Shengjun
    Cai, Yiheng
    Yao, Zijun
    Tan, Meiling
    APPLIED INTELLIGENCE, 2023, 53 (24) : 30607 - 30625
  • [40] Weakly-supervised Video Anomaly Detection with Robust Temporal Feature Magnitude Learning
    Tian, Yu
    Pang, Guansong
    Chen, Yuanhong
    Singh, Rajvinder
    Verjans, Johan W.
    Carneiro, Gustavo
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 4955 - 4966