Video-Based Human Activity Recognition Using Deep Learning Approaches

被引:9
|
作者
Surek, Guilherme Augusto Silva [1 ]
Seman, Laio Oriel [2 ]
Stefenon, Stefano Frizzo [3 ,4 ]
Mariani, Viviana Cocco [5 ,6 ]
Coelho, Leandro dos Santos [1 ,5 ]
机构
[1] Pontif Catholic Univ Parana PUCPR, Ind & Syst Engn Grad Program PPGEPS, BR-80215901 Curitiba, Brazil
[2] Univ Vale Itajai, Grad Program Appl Comp Sci, BR-88302901 Itajai, Brazil
[3] Fdn Bruno Kessler, Digital Ind Ctr, I-38123 Trento, Italy
[4] Univ Udine, Dept Math Comp Sci & Phys, I-33100 Udine, Italy
[5] Fed Univ Parana UFPR, Dept Elect Engn, BR-81530000 Curitiba, Brazil
[6] Pontif Catholic Univ Parana, Mech Engn Grad Program PPGEM, BR-80215901 Curitiba, Brazil
关键词
convolutional neural network; deep learning; self-DIstillation with NO labels (DINO); video human action recognition; vision transformer architecture; NETWORK;
D O I
10.3390/s23146384
中图分类号
O65 [分析化学];
学科分类号
070302 ; 081704 ;
摘要
Due to its capacity to gather vast, high-level data about human activity from wearable or stationary sensors, human activity recognition substantially impacts people's day-to-day lives. Multiple people and things may be seen acting in the video, dispersed throughout the frame in various places. Because of this, modeling the interactions between many entities in spatial dimensions is necessary for visual reasoning in the action recognition task. The main aim of this paper is to evaluate and map the current scenario of human actions in red, green, and blue videos, based on deep learning models. A residual network (ResNet) and a vision transformer architecture (ViT) with a semi-supervised learning approach are evaluated. The DINO (self-DIstillation with NO labels) is used to enhance the potential of the ResNet and ViT. The evaluated benchmark is the human motion database (HMDB51), which tries to better capture the richness and complexity of human actions. The obtained results for video classification with the proposed ViT are promising based on performance metrics and results from the recent literature. The results obtained using a bi-dimensional ViT with long short-term memory demonstrated great performance in human action recognition when applied to the HMDB51 dataset. The mentioned architecture presented 96.7 & PLUSMN; 0.35% and 41.0 & PLUSMN; 0.27% in terms of accuracy (mean & PLUSMN; standard deviation values) in the train and test phases of the HMDB51 dataset, respectively.
引用
收藏
页数:15
相关论文
共 50 条
  • [1] Deep learning approaches for video-based anomalous activity detection
    Karishma Pawar
    Vahida Attar
    [J]. World Wide Web, 2019, 22 : 571 - 601
  • [2] Deep learning approaches for video-based anomalous activity detection
    Pawar, Karishma
    Attar, Vahida
    [J]. WORLD WIDE WEB-INTERNET AND WEB INFORMATION SYSTEMS, 2019, 22 (02): : 571 - 601
  • [3] Recent Advances in Video-Based Human Action Recognition using Deep Learning: A Review
    Wu, Di
    Sharma, Nabin
    Blumenstein, Michael
    [J]. 2017 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2017, : 2865 - 2872
  • [4] Video-Based Facial Expression Recognition Using a Deep Learning Approach
    Jangid, Mahesh
    Paharia, Pranjul
    Srivastava, Sumit
    [J]. ADVANCES IN COMPUTER COMMUNICATION AND COMPUTATIONAL SCIENCES, IC4S 2018, 2019, 924 : 653 - 660
  • [5] Deep Learning Approaches for Human Activity Recognition in Video Surveillance - A Survey
    Khurana, Rajat
    Kushwaha, Alok Kumar Singh
    [J]. 2018 FIRST INTERNATIONAL CONFERENCE ON SECURE CYBER COMPUTING AND COMMUNICATIONS (ICSCCC 2018), 2018, : 542 - 544
  • [6] A deep learning method for video-based action recognition
    Zhang, Guanwen
    Rao, Yukun
    Wang, Changhao
    Zhou, Wei
    Ji, Xiangyang
    [J]. IET IMAGE PROCESSING, 2021, 15 (14) : 3498 - 3511
  • [7] A Review on Video-Based Human Activity Recognition
    Ke, Shian-Ru
    Hoang Le Uyen Thuc
    Lee, Yong-Jin
    Hwang, Jenq-Neng
    Yoo, Jang-Hee
    Choi, Kyoung-Ho
    [J]. COMPUTERS, 2013, 2 (02) : 88 - 131
  • [8] Deep learning for video-based automated pain recognition in rabbits
    Feighelstein, Marcelo
    Ehrlich, Yamit
    Naftaly, Li
    Alpin, Miriam
    Nadir, Shenhav
    Shimshoni, Ilan
    Pinho, Renata H.
    Luna, Stelio P. L.
    Zamansky, Anna
    [J]. SCIENTIFIC REPORTS, 2023, 13 (01)
  • [9] Deep learning for video-based automated pain recognition in rabbits
    Marcelo Feighelstein
    Yamit Ehrlich
    Li Naftaly
    Miriam Alpin
    Shenhav Nadir
    Ilan Shimshoni
    Renata H. Pinho
    Stelio P. L. Luna
    Anna Zamansky
    [J]. Scientific Reports, 13
  • [10] Analytical Review on Video-Based Human Activity Recognition
    Akansha, Uzair Asad
    Shailendra, Mishra
    Singh, Narayan
    [J]. PROCEEDINGS OF THE 10TH INDIACOM - 2016 3RD INTERNATIONAL CONFERENCE ON COMPUTING FOR SUSTAINABLE GLOBAL DEVELOPMENT, 2016, : 3839 - 3844