Action Recognition in Videos with Spatio-Temporal Fusion 3D Convolutional Neural Networks

被引:5
|
作者
Wang, Y. [1 ]
Shen, X. J. [1 ]
Chen, H. P. [1 ]
Sun, J. X. [2 ]
机构
[1] Jilin Univ, Coll Comp Sci & Technol, Changchun 130012, Peoples R China
[2] Jilin Univ, Coll Software, Changchun 130012, Jilin, Peoples R China
基金
中国国家自然科学基金;
关键词
video action recognition; 3D Convolutional Neural Network; spatiotemporal information; bilinear fusion;
D O I
10.1134/S105466182103024X
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Feature extraction based traditional human action recognition algorithms are complicated, leading to low recognition accuracy. We present an algorithm for the recognition of human actions in videos based on spatio-temporal fusion using 3D convolutional neural networks (3D CNNs). The algorithm contains two subnetworks, which extract deep spatial information and temporal information, respectively, and bilinear fusion policy is applied to obtain the final fused spatio-temporal information. Spatial information is represented by a gradient feature, and the temporal information is represented by optical flow. The fused spatio-temporal information can retrieve deep features from multiple angles by constructing a new 3D CNNs. The proposed algorithm is compared with the current mainstream algorithms in the KTH and UCF101 datasets, showing effectiveness and high recognition accuracy.
引用
收藏
页码:580 / 587
页数:8
相关论文
共 50 条
  • [1] Action Recognition in Videos with Spatio-Temporal Fusion 3D Convolutional Neural Networks
    Y. Wang
    X. J. Shen
    H. P. Chen
    J. X. Sun
    [J]. Pattern Recognition and Image Analysis, 2021, 31 : 580 - 587
  • [2] Learning Representations from Spatio-Temporal Distance Maps for 3D Action Recognition with Convolutional Neural Networks
    Naveenkumar, M.
    Domnic, S.
    [J]. ADCAIJ-ADVANCES IN DISTRIBUTED COMPUTING AND ARTIFICIAL INTELLIGENCE JOURNAL, 2019, 8 (02): : 5 - 18
  • [3] Spatio-Temporal Fusion Networks for Action Recognition
    Cho, Sangwoo
    Foroosh, Hassan
    [J]. COMPUTER VISION - ACCV 2018, PT I, 2019, 11361 : 347 - 364
  • [4] Unified Spatio-Temporal Attention Networks for Action Recognition in Videos
    Li, Dong
    Yao, Ting
    Duan, Ling-Yu
    Mei, Tao
    Rui, Yong
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2019, 21 (02) : 416 - 428
  • [5] Learning Spatio-Temporal Features with 3D Residual Networks for Action Recognition
    Hara, Kensho
    Kataoka, Hirokatsu
    Satoh, Yutaka
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW 2017), 2017, : 3154 - 3160
  • [6] Spatio-Temporal Image Representation of 3D Skeletal Movements for View-Invariant Action Recognition with Deep Convolutional Neural Networks
    Huy Hieu Pham
    Salmane, Houssam
    Khoudour, Louahdi
    Crouzil, Alain
    Zegers, Pablo
    Velastin, Sergio A.
    [J]. SENSORS, 2019, 19 (08)
  • [7] A Spatio-Temporal Convolutional Neural Network for Skeletal Action Recognition
    Hu, Lizhang
    Xu, Jinhua
    [J]. NEURAL INFORMATION PROCESSING (ICONIP 2017), PT III, 2017, 10636 : 377 - 385
  • [8] Action Recognition Based on Features Fusion and 3D Convolutional Neural Networks
    Liu, Lulu
    Hu, Fangyu
    Zhou, Jiahui
    [J]. PROCEEDINGS OF 2016 9TH INTERNATIONAL SYMPOSIUM ON COMPUTATIONAL INTELLIGENCE AND DESIGN (ISCID), VOL 1, 2016, : 178 - 181
  • [9] Online Spatio-temporal 3D Convolutional Neural Network for Early Recognition of Handwritten Gestures
    Mocaer, William
    Anquetil, Eric
    Kulpa, Richard
    [J]. DOCUMENT ANALYSIS AND RECOGNITION - ICDAR 2021, PT I, 2021, 12821 : 221 - 236
  • [10] Spatio-temporal constraints for on-line 3D object recognition in videos
    Noceti, Nicoletta
    Delponte, Elisabetta
    Odone, Francesca
    [J]. COMPUTER VISION AND IMAGE UNDERSTANDING, 2009, 113 (12) : 1198 - 1209