A two-stage temporal proposal network for precise action localization in untrimmed video

被引:0
|
作者
Fei Wang
Guorui Wang
Yuxuan Du
Zhenquan He
Yong Jiang
机构
[1] Northeastern University,Faculty of Robot Science and Engineering
[2] Northeastern University,College of Information Science and Engineering
[3] Shenyang Institute of Automation Chinese Academy of Sciences,undefined
关键词
Action detection; Correctness discriminator; Extended context pooling; Temporal context regression;
D O I
暂无
中图分类号
学科分类号
摘要
In this paper, we propose a two-stage temporal proposal algorithm for the action detection task of long untrimmed videos. In the first stage, we propose a novel prior-minor watershed algorithm for action proposals with precise prior watershed proposal algorithm and minor supplementary sliding window algorithm. Here, we propose the correctness discriminator to fill the proposals that watershed proposal algorithm may omit with the sliding window proposals. In the second stage, an extended context pooling (ECP) is firstly proposed with two modules (internal and context). The context information module of ECP can structure the proposals and enhance the extended features of action proposals. Different level of ECP is introduced to model the action proposal region and make its extended context region more targeted and precise. Then, we propose a temporal context regression network, which adopts a multi-task loss to realize the training of the temporal coordinate regression and the action/background classification simultaneously, and outputs the precise temporal boundaries of the proposals. Here, we also propose prior-minor ranking to balance the effect of the prior watershed proposals and the minor supplementary proposals. On three large scale benchmarks THUMOS14, ActivityNet (v1.2 and v1.3), and Charades, our approach achieves superior performances compared with other state-of-the-art methods and runs over 1020 frames per second (fps) on a single NVIDIA Titan-X Pascal GPU, indicating that our method can efficiently improve the precision of action localization task.
引用
收藏
页码:2199 / 2211
页数:12
相关论文
共 50 条
  • [1] A two-stage temporal proposal network for precise action localization in untrimmed video
    Wang, Fei
    Wang, Guorui
    Du, Yuxuan
    He, Zhenquan
    Jiang, Yong
    INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2021, 12 (08) : 2199 - 2211
  • [2] Boundary Proposal Network for Two-Stage Natural Language Video Localization
    Xiao, Shaoning
    Chen, Long
    Zhang, Songyang
    Ji, Wei
    Shao, Jian
    Ye, Lu
    Xiao, Jun
    THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 2986 - 2994
  • [3] SCALE MATTERS: TEMPORAL SCALE AGGREGATION NETWORK FOR PRECISE ACTION LOCALIZATION IN UNTRIMMED VIDEOS
    Gong, Guoqiang
    Zheng, Liangfeng
    Mu, Yadong
    2020 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2020,
  • [4] Two-Stage Recognition Algorithm for Untrimmed Converter Steelmaking Flame Video
    Chen, Yi
    Liu, Jiyuan
    Xiong, Huilin
    PATTERN RECOGNITION AND COMPUTER VISION, PT I, 2021, 13019 : 268 - 279
  • [5] Two-stage transfer network for weakly supervised action localization
    Su, Qiubin
    NEUROCOMPUTING, 2019, 339 : 202 - 209
  • [6] TSRN: two-stage refinement network for temporal action segmentation
    Xiaoyan Tian
    Ye Jin
    Xianglong Tang
    Pattern Analysis and Applications, 2023, 26 (3) : 1375 - 1393
  • [7] TSRN: two-stage refinement network for temporal action segmentation
    Tian, Xiaoyan
    Jin, Ye
    Tang, Xianglong
    PATTERN ANALYSIS AND APPLICATIONS, 2023, 26 (03) : 1375 - 1393
  • [8] Temporal Action Localization in Untrimmed Videos via Multi-stage CNNs
    Shou, Zheng
    Wang, Dongang
    Chang, Shih-Fu
    2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 1049 - 1058
  • [9] CDC: Convolutional-De-Convolutional Networks for Precise Temporal Action Localization in Untrimmed Videos
    Shou, Zheng
    Chan, Jonathan
    Zareian, Alireza
    Miyazawa, Kazuyuki
    Chang, Shih-Fu
    30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 1417 - 1426
  • [10] Graph-based temporal action co-localization from an untrimmed video
    Wang, Le
    Zhai, Changbo
    Zhang, Qilin
    Tang, Wei
    Zheng, Nanning
    Hua, Gang
    NEUROCOMPUTING, 2021, 434 : 211 - 223