YogNet: A two-stream network for realtime multiperson yoga action recognition and posture correction

被引:14
|
作者
Yadav, Santosh Kumar [1 ,2 ]
Agarwal, Aayush [3 ]
Kumar, Ashish [3 ]
Tiwari, Kamlesh [3 ]
Pandey, Hari Mohan [4 ]
Akbar, Shaik Ali [1 ,2 ]
机构
[1] Acad Sci & Innovat Res AcSIR, Ghaziabad 201002, Uttar Pradesh, India
[2] Cent Elect Engn Res Inst CEERI, Cyber Phys Syst, CSIR, Pilani 333031, India
[3] Birla Inst Technol & Sci Pilani, Dept CSIS, Pilani Campus, Pilani 333031, Rajasthan, India
[4] Bournemouth Univ, Dept Comp & informat, Poole, England
关键词
Action recognition; Computer vision; Posture correction; Yoga and exercise;
D O I
10.1016/j.knosys.2022.109097
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Yoga is a traditional Indian exercise. It specifies various body postures called asanas, practicing them is beneficial for the physical, mental, and spiritual well-being. To support the yoga practitioners, there is a need of an expert yoga asanas recognition system that can automatically analyze practitioner's postures and could provide suitable posture correction instructions. This paper proposes YogNet, a multi-person yoga expert system for 20 asanas using a two-stream deep spatiotemporal neural network architecture. The first stream utilizes a keypoint detection approach to detect the practitioner's pose, followed by the formation of bounding boxes across the subject. The model then applies time distributed convolutional neural networks (CNNs) to extract frame-wise postural features, followed by regularized long shortterm memory (LSTM) networks to give temporal predictions. The second stream utilizes 3D-CNNs for spatiotemporal feature extraction from RGB videos. Finally, the scores of two streams are fused using multiple fusion techniques. A yoga asana recognition database (YAR) containing 1206 videos is collected using a single 2D web camera for 367 min with the help of 16 participants and contains four view variations i.e. front, back, left, and right sides. The proposed system is novel as this is the earliest two-stream deep learning-based system that can perform multi-person yoga asanas recognition and correction in realtime. Simulation result reveals that YogNet system achieved 77.29%, 89.29%, and 96.31% accuracies using pose stream, RGB stream, and via fusion of both streams, respectively. These results are impressive and sufficiently high for recommendation towards general adaption of the system.
引用
收藏
页数:16
相关论文
共 50 条
  • [1] A heterogeneous two-stream network for human action recognition
    Liao, Shengbin
    Wang, Xiaofeng
    Yang, ZongKai
    AI COMMUNICATIONS, 2023, 36 (03) : 219 - 233
  • [2] A Spatiotemporal Heterogeneous Two-Stream Network for Action Recognition
    Chen, Enqing
    Bai, Xue
    Gao, Lei
    Tinega, Haron Chweya
    Ding, Yingqiang
    IEEE ACCESS, 2019, 7 : 57267 - 57275
  • [3] A Multimode Two-Stream Network for Egocentric Action Recognition
    Li, Ying
    Shen, Jie
    Xiong, Xin
    He, Wei
    Li, Peng
    Yan, Wenjie
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2021, PT I, 2021, 12891 : 357 - 368
  • [4] Convolutional Two-Stream Network Fusion for Video Action Recognition
    Feichtenhofer, Christoph
    Pinz, Axel
    Zisserman, Andrew
    2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 1933 - 1941
  • [5] Two-Stream Convolutional Neural Network for Video Action Recognition
    Qiao, Han
    Liu, Shuang
    Xu, Qingzhen
    Liu, Shouqiang
    Yang, Wanggan
    KSII TRANSACTIONS ON INTERNET AND INFORMATION SYSTEMS, 2021, 15 (10): : 3668 - 3684
  • [6] Hidden Two-Stream Collaborative Learning Network for Action Recognition
    Zhou, Shuren
    Chen, Le
    Sugumaran, Vijayan
    CMC-COMPUTERS MATERIALS & CONTINUA, 2020, 63 (03): : 1545 - 1561
  • [7] Two-Stream Convolution Neural Network with Video-stream for Action Recognition
    Dai, Wei
    Chen, Yimin
    Huang, Chen
    Gao, Ming-Ke
    Zhang, Xinyu
    2019 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2019,
  • [8] Human Action Recognition Based on Improved Two-Stream Convolution Network
    Wang, Zhongwen
    Lu, Haozhu
    Jin, Junlan
    Hu, Kai
    APPLIED SCIENCES-BASEL, 2022, 12 (12):
  • [9] TBRNet: Two-Stream BiLSTM Residual Network for Video Action Recognition
    Wu, Xiao
    Ji, Qingge
    ALGORITHMS, 2020, 13 (07) : 1 - 21
  • [10] Human Action Recognition Based on a Two-stream Convolutional Network Classifier
    Silva, Vincius de Oliveira
    Vidal, Flavio de Barros
    Soares Romariz, Alexandre Ricardo
    2017 16TH IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA), 2017, : 774 - 778