Spatio-Temporal Action Detection in Untrimmed Videos by Using Multimodal Features and Region Proposals

被引:2
|
作者
Song, Yeongtaek [1 ]
Kim, Incheol [1 ]
机构
[1] Kyonggi Univ, Dept Comp Sci, 154-42 Gwanggyosan Ro, Suwon 16227, South Korea
关键词
video action detection; region proposal; spatio-temporal action detection; recurrent neural network; RECOGNITION;
D O I
10.3390/s19051085
中图分类号
O65 [分析化学];
学科分类号
070302 ; 081704 ;
摘要
This paper proposes a novel deep neural network model for solving the spatio-temporal-action-detection problem, by localizing all multiple-action regions and classifying the corresponding actions in an untrimmed video. The proposed model uses a spatio-temporal region proposal method to effectively detect multiple-action regions. First, in the temporal region proposal, anchor boxes were generated by targeting regions expected to potentially contain actions. Unlike the conventional temporal region proposal methods, the proposed method uses a complementary two-stage method to effectively detect the temporal regions of the respective actions occurring asynchronously. In addition, to detect a principal agent performing an action among the people appearing in a video, the spatial region proposal process was used. Further, coarse-level features contain comprehensive information of the whole video and have been frequently used in conventional action-detection studies. However, they cannot provide detailed information of each person performing an action in a video. In order to overcome the limitation of coarse-level features, the proposed model additionally learns fine-level features from the proposed action tubes in the video. Various experiments conducted using the LIRIS-HARL and UCF-10 datasets confirm the high performance and effectiveness of the proposed deep neural network model.
引用
收藏
页数:19
相关论文
共 50 条
  • [1] A Proposal-Based Solution to Spatio-Temporal Action Detection in Untrimmed Videos
    Gleason, Joshua
    Ranjan, Rajeev
    Schwarcz, Steven
    Castillo, Carlos D.
    Chen, Jun-Cheng
    Chellappa, Rama
    [J]. 2019 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2019, : 141 - 150
  • [2] Spatio-Temporal Activity Detection and Recognition in Untrimmed Surveillance Videos
    Gkountakos, Konstantinos
    Touska, Despoina
    Ioannidis, Konstantinos
    Tsikrika, Theodora
    Vrochidis, Stefanos
    Kompatsiaris, Ioannis
    [J]. PROCEEDINGS OF THE 2021 INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL (ICMR '21), 2021, : 451 - 455
  • [3] JOINT SPATIO-TEMPORAL ACTION LOCALIZATION IN UNTRIMMED VIDEOS WITH PER-FRAME SEGMENTATION
    Duan, Xuhuan
    Wang, Le
    Zhai, Changbo
    Zhang, Qilin
    Niu, Zhenxing
    Zheng, Nanning
    Hua, Gang
    [J]. 2018 25TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2018, : 918 - 922
  • [4] Segment-Tube: Spatio-Temporal Action Localization in Untrimmed Videos with Per-Frame Segmentation
    Wang, Le
    Duan, Xuhuan
    Zhang, Qilin
    Niu, Zhenxing
    Hua, Gang
    Zheng, Nanning
    [J]. SENSORS, 2018, 18 (05)
  • [5] Spatio-temporal Object Detection Proposals
    Oneata, Dan
    Revaud, Jerome
    Verbeek, Jakob
    Schmid, Cordelia
    [J]. COMPUTER VISION - ECCV 2014, PT III, 2014, 8691 : 737 - 752
  • [6] Relevance Detection in Cataract Surgery Videos by Spatio-Temporal Action Localization
    Ghamsarian, Negin
    Taschwer, Mario
    Putzgruber-Adamitsch, Doris
    Sarny, Stephanie
    Schoeffmann, Klaus
    [J]. 2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 10720 - 10727
  • [7] Video Imprint Segmentation for Temporal Action Detection in Untrimmed Videos
    Gao, Zhanning
    Wang, Le
    Zhang, Qilin
    Niu, Zhenxing
    Zheng, Nanning
    Hua, Gang
    [J]. THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / NINTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, : 8328 - 8335
  • [8] Spatio-Temporal Vector of Locally Max Pooled Features for Action Recognition in Videos
    Duta, Ionut Cosmin
    Ionescu, Bogdan
    Aizawa, Kiyoharu
    Sebe, Nicu
    [J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 3205 - 3214
  • [9] Temporal Action Localization in Untrimmed Videos Using Action Pattern Trees
    Song, Hao
    Wu, Xinxiao
    Zhu, Bing
    Wu, Yuwei
    Chen, Mei
    Jia, Yunde
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2019, 21 (03) : 717 - 730
  • [10] On Spatio-Temporal Saliency Detection in Videos using Multilinear PCA
    Sidibe, Desire
    Rastgoo, Mojdeh
    Meriaudeau, Fabrice
    [J]. 2016 23RD INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2016, : 1876 - 1880