YogNet: A two-stream network for realtime multiperson yoga action recognition and posture correction

被引:14
|
作者
Yadav, Santosh Kumar [1 ,2 ]
Agarwal, Aayush [3 ]
Kumar, Ashish [3 ]
Tiwari, Kamlesh [3 ]
Pandey, Hari Mohan [4 ]
Akbar, Shaik Ali [1 ,2 ]
机构
[1] Acad Sci & Innovat Res AcSIR, Ghaziabad 201002, Uttar Pradesh, India
[2] Cent Elect Engn Res Inst CEERI, Cyber Phys Syst, CSIR, Pilani 333031, India
[3] Birla Inst Technol & Sci Pilani, Dept CSIS, Pilani Campus, Pilani 333031, Rajasthan, India
[4] Bournemouth Univ, Dept Comp & informat, Poole, England
关键词
Action recognition; Computer vision; Posture correction; Yoga and exercise;
D O I
10.1016/j.knosys.2022.109097
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Yoga is a traditional Indian exercise. It specifies various body postures called asanas, practicing them is beneficial for the physical, mental, and spiritual well-being. To support the yoga practitioners, there is a need of an expert yoga asanas recognition system that can automatically analyze practitioner's postures and could provide suitable posture correction instructions. This paper proposes YogNet, a multi-person yoga expert system for 20 asanas using a two-stream deep spatiotemporal neural network architecture. The first stream utilizes a keypoint detection approach to detect the practitioner's pose, followed by the formation of bounding boxes across the subject. The model then applies time distributed convolutional neural networks (CNNs) to extract frame-wise postural features, followed by regularized long shortterm memory (LSTM) networks to give temporal predictions. The second stream utilizes 3D-CNNs for spatiotemporal feature extraction from RGB videos. Finally, the scores of two streams are fused using multiple fusion techniques. A yoga asana recognition database (YAR) containing 1206 videos is collected using a single 2D web camera for 367 min with the help of 16 participants and contains four view variations i.e. front, back, left, and right sides. The proposed system is novel as this is the earliest two-stream deep learning-based system that can perform multi-person yoga asanas recognition and correction in realtime. Simulation result reveals that YogNet system achieved 77.29%, 89.29%, and 96.31% accuracies using pose stream, RGB stream, and via fusion of both streams, respectively. These results are impressive and sufficiently high for recommendation towards general adaption of the system.
引用
收藏
页数:16
相关论文
共 50 条
  • [21] Improved two-stream model for human action recognition
    Zhao, Yuxuan
    Man, Ka Lok
    Smith, Jeremy
    Siddique, Kamran
    Guan, Sheng-Uei
    EURASIP JOURNAL ON IMAGE AND VIDEO PROCESSING, 2020, 2020 (01)
  • [22] Hidden Two-Stream Convolutional Networks for Action Recognition
    Zhu, Yi
    Lan, Zhenzhong
    Newsam, Shawn
    Hauptmann, Alexander
    COMPUTER VISION - ACCV 2018, PT III, 2019, 11363 : 363 - 378
  • [23] Two-Stream Dictionary Learning Architecture for Action Recognition
    Xu, Ke
    Jiang, Xinghao
    Sun, Tanfeng
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2017, 27 (03) : 567 - 576
  • [24] Two-Stream Gated Fusion ConvNets for Action Recognition
    Zhu, Jiagang
    Zou, Wei
    Zhu, Zheng
    2018 24TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2018, : 597 - 602
  • [25] Two-Stream Convolutional Networks for Action Recognition in Videos
    Simonyan, Karen
    Zisserman, Andrew
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 27 (NIPS 2014), 2014, 27
  • [26] Two-stream spatiotemporal networks for skeleton action recognition
    Wang, Lei
    Zhang, Jianwei
    Yang, Shanmin
    Gu, Song
    IET IMAGE PROCESSING, 2023, 17 (11) : 3358 - 3370
  • [27] Improved two-stream model for human action recognition
    Yuxuan Zhao
    Ka Lok Man
    Jeremy Smith
    Kamran Siddique
    Sheng-Uei Guan
    EURASIP Journal on Image and Video Processing, 2020
  • [28] Spatial-temporal interaction learning based two-stream network for action recognition
    Liu, Tianyu
    Ma, Yujun
    Yang, Wenhan
    Ji, Wanting
    Wang, Ruili
    Jiang, Ping
    INFORMATION SCIENCES, 2022, 606 : 864 - 876
  • [29] Two-Stream Network for Sign Language Recognition and Translation
    Chen, Yutong
    Zuo, Ronglai
    Wei, Fangyun
    Wu, Yu
    Liu, Shujie
    Mak, Brian
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,
  • [30] VirtualActionNet: A strong two-stream point cloud sequence network for human action recognition
    Li, Xing
    Huang, Qian
    Wang, Zhijian
    Yang, Tianjin
    JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2022, 89