A two-stage temporal proposal network for precise action localization in untrimmed video

被引:0
|
作者
Fei Wang
Guorui Wang
Yuxuan Du
Zhenquan He
Yong Jiang
机构
[1] Northeastern University,Faculty of Robot Science and Engineering
[2] Northeastern University,College of Information Science and Engineering
[3] Shenyang Institute of Automation Chinese Academy of Sciences,undefined
关键词
Action detection; Correctness discriminator; Extended context pooling; Temporal context regression;
D O I
暂无
中图分类号
学科分类号
摘要
In this paper, we propose a two-stage temporal proposal algorithm for the action detection task of long untrimmed videos. In the first stage, we propose a novel prior-minor watershed algorithm for action proposals with precise prior watershed proposal algorithm and minor supplementary sliding window algorithm. Here, we propose the correctness discriminator to fill the proposals that watershed proposal algorithm may omit with the sliding window proposals. In the second stage, an extended context pooling (ECP) is firstly proposed with two modules (internal and context). The context information module of ECP can structure the proposals and enhance the extended features of action proposals. Different level of ECP is introduced to model the action proposal region and make its extended context region more targeted and precise. Then, we propose a temporal context regression network, which adopts a multi-task loss to realize the training of the temporal coordinate regression and the action/background classification simultaneously, and outputs the precise temporal boundaries of the proposals. Here, we also propose prior-minor ranking to balance the effect of the prior watershed proposals and the minor supplementary proposals. On three large scale benchmarks THUMOS14, ActivityNet (v1.2 and v1.3), and Charades, our approach achieves superior performances compared with other state-of-the-art methods and runs over 1020 frames per second (fps) on a single NVIDIA Titan-X Pascal GPU, indicating that our method can efficiently improve the precision of action localization task.
引用
收藏
页码:2199 / 2211
页数:12
相关论文
共 50 条
  • [21] Spatio-Temporal Two-stage Fusion for video question answering
    Xu, Feifei
    Zhu, Yitao
    Wang, Chun
    Cao, Yangze
    Zhong, Zheng
    Li, Xiongmin
    COMPUTER VISION AND IMAGE UNDERSTANDING, 2023, 237
  • [22] A Proposal-Based Solution to Spatio-Temporal Action Detection in Untrimmed Videos
    Gleason, Joshua
    Ranjan, Rajeev
    Schwarcz, Steven
    Castillo, Carlos D.
    Chen, Jun-Cheng
    Chellappa, Rama
    2019 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2019, : 141 - 150
  • [23] Video Self-Stitching Graph Network for Temporal Action Localization
    Zhao, Chen
    Thabet, Ali
    Ghanem, Bernard
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 13638 - 13647
  • [24] TORNADO: A Spatio-Temporal Convolutional Regression Network for Video Action Proposal
    Zhu, Hongyuan
    Vial, Romain
    Lu, Shijian
    2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, : 5814 - 5822
  • [25] Contextual Proposal Network for Action Localization
    Hsieh, He-Yen
    Chen, Ding-Jie
    Liu, Tyng-Luh
    2022 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV 2022), 2022, : 766 - 775
  • [26] Sparse Frame Grouping Network with Action Centered for Untrimmed Video Paragraph Captioning
    Yu, Guorui
    Hu, Yimin
    Zhang, Yuejie
    Feng, Rui
    Zhang, Tao
    Gao, Shang
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EMNLP 2023), 2023, : 14571 - 14580
  • [27] TEMPORAL ATTENTION NETWORK FOR ACTION PROPOSAL
    Liu, Chenyang
    Xu, Xiangyu
    Zhang, Yujin
    2018 25TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2018, : 2281 - 2285
  • [28] Exploring Temporal Preservation Networks for Precise Temporal Action Localization
    Yang, Ke
    Qiao, Peng
    Li, Dongsheng
    Lv, Shaohe
    Dou, Yong
    THIRTY-SECOND AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTIETH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / EIGHTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2018, : 7477 - 7484
  • [29] ATMNet: Adaptive Two-Stage Modular Network for Accurate Video Captioning
    Xu, Tianyang
    Zhang, Yunjie
    Song, Xiaoning
    Feng, Zheng-Hua
    Wu, Xiao-Jun
    IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS, 2025,
  • [30] Two-Stage Video Shadow Detection via Temporal-Spatial Adaption
    Duan, Xin
    Cao, Yu
    Zhu, Lei
    Fu, Gang
    Wang, Xin
    Zhang, Renjie
    Li, Ping
    COMPUTER VISION - ECCV 2024, PT XLVIII, 2025, 15106 : 196 - 214