YogNet: A two-stream network for realtime multiperson yoga action recognition and posture correction

被引：14

作者：

Yadav, Santosh Kumar ^{[1
,2
]}

Agarwal, Aayush ^{[3
]}

Kumar, Ashish ^{[3
]}

Tiwari, Kamlesh ^{[3
]}

Pandey, Hari Mohan ^{[4
]}

Akbar, Shaik Ali ^{[1
,2
]}

机构：

[1] Acad Sci & Innovat Res AcSIR, Ghaziabad 201002, Uttar Pradesh, India

[2] Cent Elect Engn Res Inst CEERI, Cyber Phys Syst, CSIR, Pilani 333031, India

[3] Birla Inst Technol & Sci Pilani, Dept CSIS, Pilani Campus, Pilani 333031, Rajasthan, India

[4] Bournemouth Univ, Dept Comp & informat, Poole, England

来源：

KNOWLEDGE-BASED SYSTEMS | 2022年 / 250卷

关键词：

Action recognition; Computer vision; Posture correction; Yoga and exercise;

D O I：

10.1016/j.knosys.2022.109097

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Yoga is a traditional Indian exercise. It specifies various body postures called asanas, practicing them is beneficial for the physical, mental, and spiritual well-being. To support the yoga practitioners, there is a need of an expert yoga asanas recognition system that can automatically analyze practitioner's postures and could provide suitable posture correction instructions. This paper proposes YogNet, a multi-person yoga expert system for 20 asanas using a two-stream deep spatiotemporal neural network architecture. The first stream utilizes a keypoint detection approach to detect the practitioner's pose, followed by the formation of bounding boxes across the subject. The model then applies time distributed convolutional neural networks (CNNs) to extract frame-wise postural features, followed by regularized long shortterm memory (LSTM) networks to give temporal predictions. The second stream utilizes 3D-CNNs for spatiotemporal feature extraction from RGB videos. Finally, the scores of two streams are fused using multiple fusion techniques. A yoga asana recognition database (YAR) containing 1206 videos is collected using a single 2D web camera for 367 min with the help of 16 participants and contains four view variations i.e. front, back, left, and right sides. The proposed system is novel as this is the earliest two-stream deep learning-based system that can perform multi-person yoga asanas recognition and correction in realtime. Simulation result reveals that YogNet system achieved 77.29%, 89.29%, and 96.31% accuracies using pose stream, RGB stream, and via fusion of both streams, respectively. These results are impressive and sufficiently high for recommendation towards general adaption of the system.

引用

页数：16

共 50 条

[21] Improved two-stream model for human action recognition
Zhao, Yuxuan
Man, Ka Lok
Smith, Jeremy
Siddique, Kamran
Guan, Sheng-Uei
EURASIP JOURNAL ON IMAGE AND VIDEO PROCESSING, 2020, 2020 (01)
[22] Hidden Two-Stream Convolutional Networks for Action Recognition
Zhu, Yi
Lan, Zhenzhong
Newsam, Shawn
Hauptmann, Alexander
COMPUTER VISION - ACCV 2018, PT III, 2019, 11363 : 363 - 378
[23] Two-Stream Dictionary Learning Architecture for Action Recognition
Xu, Ke
Jiang, Xinghao
Sun, Tanfeng
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2017, 27 (03) : 567 - 576
[24] Two-Stream Gated Fusion ConvNets for Action Recognition
Zhu, Jiagang
Zou, Wei
Zhu, Zheng
2018 24TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2018, : 597 - 602
[25] Two-Stream Convolutional Networks for Action Recognition in Videos
Simonyan, Karen
Zisserman, Andrew
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 27 (NIPS 2014), 2014, 27
[26] Two-stream spatiotemporal networks for skeleton action recognition
Wang, Lei
Zhang, Jianwei
Yang, Shanmin
Gu, Song
IET IMAGE PROCESSING, 2023, 17 (11) : 3358 - 3370
[27] Improved two-stream model for human action recognition
Yuxuan Zhao
Ka Lok Man
Jeremy Smith
Kamran Siddique
Sheng-Uei Guan
EURASIP Journal on Image and Video Processing, 2020
[28] Spatial-temporal interaction learning based two-stream network for action recognition
Liu, Tianyu
Ma, Yujun
Yang, Wenhan
Ji, Wanting
Wang, Ruili
Jiang, Ping
INFORMATION SCIENCES, 2022, 606 : 864 - 876
[29] Two-Stream Network for Sign Language Recognition and Translation
Chen, Yutong
Zuo, Ronglai
Wei, Fangyun
Wu, Yu
Liu, Shujie
Mak, Brian
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,
[30] VirtualActionNet: A strong two-stream point cloud sequence network for human action recognition
Li, Xing
Huang, Qian
Wang, Zhijian
Yang, Tianjin
JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2022, 89

← 1 2 3 4 5 →