Behavior recognition method based on improved 3D convolutional neural network

被引:0
|
作者
Zhang X. [1 ]
Li C. [1 ]
Sun L. [1 ]
Zhang M. [1 ]
机构
[1] School of Mechanical Engineering, Hebei University of Technology, Tianjin
关键词
3D convolutional neural network; Behavior recognition; Deep learning theory; Dual flow data; Residual network;
D O I
10.13196/j.cims.2019.08.014
中图分类号
学科分类号
摘要
In view of the problems of too large video stream data and too many setting 3D convolution kernel parameters in human behavior based on video, which led to long training time and difficulty in tuning the parameters, a method to divide 3D convolution kernel into two kinds of convolution kernels that were space domain and time domain was proposed based on 3D convolutional neural network. Two data streams formed by two convolution kernels were interact with each other, thus the network structure and reducing parameter settings were optimizated. The training verification was performed on two behavioral identification datasets named KTH and UCF101 and the accuracy rate of recognition behavior was 96.2% and 90.7% respectively. The results showed that the proposed method could speed up the training progress by 7.5%~7.8% and ensure the training accuracy at the same time. Therefore, this method could effectively reduce the hardware requirements for deep learning in behavior recognition and improve the efficiency of model training, which could be widely used in the field of intelligent robots. © 2019, Editorial Department of CIMS. All right reserved.
引用
收藏
页码:2000 / 2006
页数:6
相关论文
共 12 条
  • [1] Yuan C., Li X., Hu W., Et al., 3D R transform on spatio-temporal interest points for action recognition, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition(CVPR), pp. 724-730, (2013)
  • [2] Toby H., Lam W., Cheung K.H., Et al., Gait flow image: a silhouette-based gait representation for human identification, Pattern Recognition, 44, 4, pp. 973-987, (2011)
  • [3] Lin B., Fang B., Spatial-temporal histograms of gradients and HOD-VLAD encoding for human action recognition, Proceedings of the International Conference on Security, Pattern Analysis, and Cybernetics, pp. 678-683, (2017)
  • [4] Cun Y.L., Boser B., Denker J.S., Et al., Handwritten digit recogni-tion with a back-propagation network, Proceedings of the Advances in Neural Information Processing Systems 2, pp. 396-404, (1990)
  • [5] Bengio Y., Learning deep architectures for AI, Foundations and Trends in Machine Learning, 2, 1, pp. 1-127, (2009)
  • [6] Ji S., Xu W., Yang M., Et al., 3D convolutional neural networks for human action recognition, IEEE Transactionson Pattern Analysis and Machine Intelligence, 35, 1, pp. 221-231, (2013)
  • [7] Zhu W., Hu J., Sun G., Et al., A key volume mining deep framework for action recognition, Proceedings of the Computer Vision & Pattern Recognition, pp. 1991-1999, (2016)
  • [8] Qin Y., Mo L., Xie B., Feature fusion for human action recognition based on classical descriptors and 3D convolutional networks, Proceedings of the 11th International Conference on Sensing Technology, pp. 1-5, (2017)
  • [9] Szegedy C., Vanhoucke V., Ioffe S., Et al., Rethinking the inception architecture for computer vision, Proceedings of the IEEE Conference on Computer Vision & Pattern Recognition, pp. 2818-2826, (2016)
  • [10] Boureau Y., Bach F., Yann L., Et al., Learning mid-level features for recognition, Proceedings of the Computer Vision & Pattern Recognition, 26, 2, pp. 2559-2566, (2010)