Spatio-Temporal Weighted Posture Motion Features for Human Skeleton Action Recognition Research

被引:0
|
作者
Ding C.-Y. [1 ,2 ]
Liu K. [1 ,2 ]
Li G. [1 ,2 ]
Yan L. [1 ,2 ]
Chen B.-Y. [1 ,2 ]
Zhong Y.-M. [1 ,2 ]
机构
[1] Department of Computer Science and Technology, Xidian University, Xi'an
[2] Beijing Institute of Telemetry Technology, Beijing
来源
基金
中国国家自然科学基金;
关键词
Action recognition; Feature representation; Linear classifier; Skeleton sequence; Temporal model;
D O I
10.11897/SP.J.1016.2020.00029
中图分类号
学科分类号
摘要
In recent years, computer vision related applications (e. g. behavior surveillance, human-computer interaction, electronic games, and health care) have gained increasing popularity, and the key technology of these interactive applications is how to make the machine understand human movements, which is also known as human action recognition. Although experts have done a lot of research before, how to accurately identify human action from traditional RGB videos is still a challenging problem due to various interference factors, such as lighting changes, view changes, occlusion and background clutter. Latterly, the popularity of depth sensors and real-time skeleton estimation algorithm based on depth image have brought opportunities for human behavior recognition research. The depth map provides additional depth information, which can easily segment the desired targets from complex scenes, significantly improve the background clutter, and greatly simplify the behavior recognition model. And all of this boosts the development of skeleton-based action recognition. The skeleton estimation algorithm defines skeleton as a graphical model composed of human trunk, head and limbs position. It can quickly and accurately estimate the 3D position information of skeleton joints from depth images at the speed of 200 frames per second on Xbox 360 GPU. Existing skeleton based behavior recognition methods can be roughly divided into two categories: joint-based methods and body-based methods. The human skeleton is regarded as a set of joints and described by the location correlation feature of the joints in the joint-based methods. These features include joint position features, relative joint location features, joint orientation features in a fixed coordinate system and so on. On the other hand, part-based methods regard the human skeleton as a set of rigid segments and use joint angle features, bioinspired 3D features and geometric relationship features of different rigid body parts to represent the human skeleton. Most of these researches focus on extracting the spatial information of different body joints in single frame and temporal information of body joints between adjacent frames to represent action sequence, but these works don't take into consideration that the importance of different body joints and postures may vary in terms of deciding which type of action class the sample belongs to. Therefore, an action recognition method based on spatio-temporal weighted posture motion features is proposed in this paper. Since each 3D video sequence can be regarded as a set of ordered static gestures, and the static pose can be regarded as a set of joints. Based on this, the author first deals with the spatial relationships of all joints contained in each static pose to obtain the spatial domain characteristics of the video sequence, and then calculate location relationships of the same joint between adjacent frames to get the temporal features. And normalization scheme is introduced to obtain the final representation of the skeleton sequence. The author also adopts bilinear classifier to calculate the weights of the joints and static postures for the action class to determine the informative joints and postures. Meanwhile, dynamic time warping (DTW) algorithm and Fourier temporal pyramid (FTP) representation are introduced to process temporal modeling for better temporal analyses, and SVM is finally used for the action classification. The experimental results on three challenging datasets demonstrate that our approach achieves competitive, even the best performance compared with the state-of-the-art methods. © 2020, Science Press. All right reserved.
引用
收藏
页码:29 / 40
页数:11
相关论文
共 35 条
  • [1] Shotton J., Fitzgibbon A., Cook M., Et al., Real-time human pose recognition in parts from single depth images, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1297-1304, (2011)
  • [2] Pirsiavash H., Ramanan D., Fowlkes C.C., Bilinear classifiers for visual recognition, Proceedings of the Conference on Neural Information Processing Systems, pp. 1482-1490, (2009)
  • [3] Ye M., Zhang Q., Wang L., Et al., A survey on human motion analysis from depth data, Time-of-Flight and Depth Imaging, Sensors, Algorithms and Applications, 8200, pp. 149-187, (2013)
  • [4] Hussein M.E., Torki M., Gowayyed M.A., Et al., Human action recognition using a temporal hierarchy of covariance descriptors on 3D joint locations, Proceedings of the International Joint Conference on Artificial Intelligence, 13, pp. 2466-2472, (2013)
  • [5] Brendel W., Todorovic S., Learning spatiotemporal graphs of human activities, Proceedings of the IEEE International Conference on Computer Vision, pp. 778-785, (2011)
  • [6] Wang J., Liu Z., Wu Y., Et al., Mining actionlet ensemble for action recognition with depth cameras, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1290-1297, (2012)
  • [7] Yang X., Tian Y.L., Eigenjoints-based action recognition using Naive-Bayes-Nearest-Neighbor, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 14-19, (2012)
  • [8] Yu J., Jeon M., Pedrycz W., Weighted feature trajectories and concatenated bag-of-features for action recognition, Neurocomputing, 131, 7, pp. 200-207, (2014)
  • [9] Zhu Y., Chen W., Guo G., Fusing spatiotemporal features and joints for 3D action recognition, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 486-491, (2013)
  • [10] Xia L., Chen C.C., Aggarwal J.K., View invariant human action recognition using histograms of 3D joints, Proceedings of the Computer Vision and Pattern Recognition Workshops, pp. 20-27, (2012)