Video-Based Temporal Enhanced Action Recognition

被引:0
|
作者
Zhang H. [1 ,2 ]
Fu D. [1 ,3 ]
Zhou K. [4 ]
机构
[1] School of Automation and Electrical Engineering, University of Science and Technology Beijing, Beijing
[2] Shunde Graduate School, University of Science and Technology Beijing, Foshan
[3] Beijing Engineering Research Center of Industrial Spectrum Imaging, University of Science and Technology Beijing, Beijing
[4] School of Advanced Engineering, University of Science and Technology Beijing, Beijing
关键词
Action Recognition; Deep Learning; Industrial Surveillance Video; Temporal Enhanced Structure;
D O I
10.16451/j.cnki.issn1003-6059.202010010
中图分类号
学科分类号
摘要
Aiming at the spatio-temporal modeling in video action recognition, a temporal enhanced action recognition algorithm based on fused spatio-temporal features is proposed under the deep learning framework. To lower the cost of video-level temporal modeling, a sparse sampling strategy is employed to adapt to video duration changes. In the recognition stage, temporal difference between adjacent feature maps is calculated to enhance the motion information in the feature level. The combination of residual structure and temporal enhanced structure is introduced to further improve the representation ability of the network. Experimental results show that the proposed algorithm obtains higher accuracy on UCF101 and HMDB51 datasets and achieves better results in the actual industrial operation recognition scene with a smaller network scale. © 2020, Science Press. All right reserved.
引用
收藏
页码:951 / 958
页数:7
相关论文
共 26 条
  • [1] KRIZHEVSKY A, SUTSKEVER I, HINTON G E., ImageNet Classification with Deep Convolutional Neural Networks, Proc of the 25th International Conference on Neural Information Processing Systems, pp. 1097-1105, (2012)
  • [2] REN S Q, HE K M, GIRSHICK R, Et al., Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks, Proc of the 28th International Conference on Neural Information Proce-ssing Systems, pp. 1137-1149, (2015)
  • [3] WU S, XU Y, ZHAO D N., Survey of Object Detection Based on Deep Convolutional Network, Pattern Recognition and Artificial Intelligence, 31, 4, pp. 335-346, (2018)
  • [4] SIMONYAN K, ZISSERMAN A., Two-Stream Convolutional Networks for Action Recognition in Videos, Proc of the 27th International Conference on Neural Information Processing Systems, pp. 568-576, (2014)
  • [5] WANG L M, XIONG Y J, WANG Z, Et al., Temporal Segment Networks: Towards Good Practices for Deep Action Recognition, Proc of the European Conference on Computer Vision, pp. 20-36, (2016)
  • [6] ZHOU B L, ANDONIAN A, OLIVA A, Et al., Temporal Relational Reasoning in Videos, Proc of the European Conference on Computer Vision, pp. 803-818, (2018)
  • [7] DIBA A, FAYYAZ M, SHARMA V, Et al., Temporal 3d ConvNets: New Architecture and Transfer Learning for Video Classification
  • [8] JI S W, XU W, YANG M, Et al., 3D Convolutional Neural Networks for Human Action Recognition, IEEE Transactions on Pattern Analysis and Machine Intelligence, 35, 1, pp. 221-231, (2013)
  • [9] TRAN D, BOURDEV L, FERGUS R, Et al., Learning Spatiotemporal Features with 3D Convolutional Networks, Proc of the IEEE International Conference on Computer Vision, pp. 4489-4497, (2015)
  • [10] XIE S N, SUN C, HUANG J, Et al., Rethinking Spatiotemporal Feature Learning: Speed-Accuracy Trade-offs in Video Classification, Proc of the European Conference on Computer Vision, pp. 318-335, (2018)