Learning frame-level affinity with video-level labels for weakly supervised temporal action detection

被引：2

作者：

Li, Bairong ^{[1
]}

Zhu, Yuesheng ^{[1
]}

Liu, Ruixin ^{[1
]}

Weng, Zhenyu ^{[1
]}

机构：

[1] Peking Univ, Shenzhen Grad Sch, Shenzhen, Peoples R China

来源：

NEUROCOMPUTING | 2021年 / 463卷

关键词：

Video understanding; Temporal action detection; Weakly supervised learning; SEMANTIC SEGMENTATION; ACTION RECOGNITION; NETWORK; LOCALIZATION;

D O I：

10.1016/j.neucom.2021.07.059

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Weakly supervised temporal action detection aims at localizing actions with only video-level labels rather than lots of frame-level labels. To this end, previous methods train a classification network for mining discernible action frames as detection results. However, the classification network is known to only concentrate on local discernible frames rather than the entire action instance. Therefore, substantial numbers of indiscernible action frames are not detected and the detection results are incomplete. To alle-viate this issue, we propose a novel method to facilitate the detection of indiscernible frames based on learning frame-level affinities. In the proposed method, we design a network (named Affinity Network) for predicting affinities between pairs of adjacent frames. Then, the affinities are used as tran-sition probabilities to propagate local responses to indiscernible frames. As a result, the responses of indiscernible frames can be enhanced and the detection of them can be facilitated. For learning the net-work, we propose strategies to synthesize frame-pair and video-pair training samples, which are con-ducive to learn frame-level affinities with only video-level labels. The experimental results on THUMOS14 dataset and ActivityNet1.2 dataset show that the detection performance of our framework outperforms most previous weakly supervised action detection methods, and is even as competitive as some fully supervised action detection methods. (c) 2021 Elsevier B.V. All rights reserved.

引用

页码：109 / 121

页数：13

共 50 条

[31] Frame-level global context modeling for detection and localization of abnormality
Sharma, Manoj Kumar
Kumar, Vikas
Sheet, Debdoot
Biswas, Prabir Kumar
MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 82 (25) : 38345 - 38370
[32] Prediction-based Loss Recovery for Frame-level Streaming Video
Kuo, Chun-, I
Shih, Chi-Huang
Shieh, Ce-Kuen
Hwang, Wen-Shyang
2011 IEEE GLOBECOM WORKSHOPS (GC WKSHPS), 2011, : 72 - 76
[33] Frame-level data reuse for motion-compensated temporal filtering
Chen, Ching-Yeh
Chen, Yi-Hau
Cheng, Chih-Chi
Chen, Liang-Gee
2006 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, VOLS 1-11, PROCEEDINGS, 2006, : 5571 - +
[34] Snippet-level Supervised Contrastive Learning-based Transformer for Temporal Action Detection
Xu, Ronghai
Liu, Changhong
Chen, Yong
Lei, Zhenchun
2023 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN, 2023,
[35] Frame-level global context modeling for detection and localization of abnormality
Manoj Kumar Sharma
Vikas Kumar
Debdoot Sheet
Prabir Kumar Biswas
Multimedia Tools and Applications, 2023, 82 : 38345 - 38370
[36] DSANet: Dynamic Segment Aggregation Network for Video-Level Representation Learning
Wu, Wenhao
Zhao, Yuxiang
Xu, Yanwu
Tan, Xiao
He, Dongliang
Zou, Zhikang
Ye, Jin
Li, Yingying
Yao, Mingde
Dong, Zichao
Shi, Yifeng
PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 1903 - 1911
[37] Temporal Structure Mining for Weakly Supervised Action Detection
Yu, Tan
Ren, Zhou
Li, Yuncheng
Yan, Enxu
Xu, Ning
Yuan, Junsong
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 5521 - 5530
[38] Weakly-supervised video anomaly detection via temporal resolution feature learning
Shengjun Peng
Yiheng Cai
Zijun Yao
Meiling Tan
Applied Intelligence, 2023, 53 : 30607 - 30625
[39] Weakly-supervised video anomaly detection via temporal resolution feature learning
Peng, Shengjun
Cai, Yiheng
Yao, Zijun
Tan, Meiling
APPLIED INTELLIGENCE, 2023, 53 (24) : 30607 - 30625
[40] Weakly-supervised Video Anomaly Detection with Robust Temporal Feature Magnitude Learning
Tian, Yu
Pang, Guansong
Chen, Yuanhong
Singh, Rajvinder
Verjans, Johan W.
Carneiro, Gustavo
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 4955 - 4966

← 1 2 3 4 5 →