Weakly Supervised Temporal Action Detection with Shot-Based Temporal Pooling Network

被引:5
|
作者
Su, Haisheng [1 ]
Zhao, Xu [1 ]
Lin, Tianwei [1 ]
Fei, Haiping [2 ]
机构
[1] Shanghai Jiao Tong Univ, Dept Automat, Shanghai, Peoples R China
[2] Ind Internet Innovat Ctr Shanghai Co Ltd, Shanghai, Peoples R China
关键词
Temporal action detection; Weak supervision; Shot-based sampling; Temporal pooling network; Class-specific;
D O I
10.1007/978-3-030-04212-7_37
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Weakly supervised temporal action detection in untrimmed videos is an important yet challenging task, where only video-level class labels are available for temporally locating actions in the videos during training. In this paper, we propose a novel architecture for this task. Specifically, we put forward an effective shot-based sampling method aiming at generating a more simplified but representative feature sequence for action detection, instead of using uniform sampling which causes extremely irrelevant frames retained. Furthermore, in order to distinguish action instances existing in the videos, we design a multi-stage Temporal Pooling Network (TPN) for the purposes of predicting video categories and localizing class-specific action instances respectively. Experiments conducted on THUMOS14 dataset confirm that our method outperforms other state-of-the-art weakly supervised approaches.
引用
收藏
页码:426 / 436
页数:11
相关论文
共 50 条
  • [41] Deep cascaded action attention network for weakly-supervised temporal action localization
    Hui-fen Xia
    Yong-zhao Zhan
    Multimedia Tools and Applications, 2023, 82 : 29769 - 29787
  • [42] Temporal Attention-Pyramid Pooling for Temporal Action Detection
    Gan, Ming-Gang
    Zhang, Yan
    IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 3799 - 3810
  • [43] Temporal Pyramid Pooling Based Relation Network for Action Recognition
    Zheng, Zhenxing
    An, Gaoyun
    Ruan, Qiuqi
    PROCEEDINGS OF 2018 14TH IEEE INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING (ICSP), 2018, : 644 - 647
  • [44] Two-stream graph convolutional neural network fusion for weakly supervised temporal action detection
    Mengyao Zhao
    Zhengping Hu
    Shufang Li
    Shuai Bi
    Zhe Sun
    Signal, Image and Video Processing, 2022, 16 : 947 - 954
  • [45] Two-stream graph convolutional neural network fusion for weakly supervised temporal action detection
    Zhao, Mengyao
    Hu, Zhengping
    Li, Shufang
    Bi, Shuai
    Sun, Zhe
    SIGNAL IMAGE AND VIDEO PROCESSING, 2022, 16 (04) : 947 - 954
  • [46] Weakly supervised spatial-temporal attention network driven by tracking and consistency loss for action detection
    Zhu, Jinlei
    Chen, Houjin
    Pan, Pan
    Sun, Jia
    EURASIP JOURNAL ON IMAGE AND VIDEO PROCESSING, 2022, 2022 (01)
  • [47] Weakly Supervised Temporal Action Localization by Multi-Stage Fusion Network
    Shen, Zhengyang
    Wang, Feng
    Dai, Jin
    IEEE ACCESS, 2020, 8 : 17287 - 17298
  • [48] Progressive enhancement network with pseudo labels for weakly supervised temporal action localization
    Wang, Qingyun
    Song, Yan
    Zou, Rong
    Shu, Xiangbo
    JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2022, 87
  • [49] Deep feature enhancing and selecting network for weakly supervised temporal action localization
    Yu, Jiaruo
    Ge, Yongxin
    Qin, Xiaolei
    Li, Ziqiang
    Huang, Sheng
    Chen, Feiyu
    JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2021, 80
  • [50] Weakly-supervised Temporal Action Localization with Adaptive Clustering and Refining Network
    Ren, Hao
    Ran, Wu
    Liu, Xingson
    Ren, Haoran
    Lu, Hong
    Zhang, Rui
    Jin, Cheng
    2023 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, ICME, 2023, : 1008 - 1013