A multidimensional feature fusion network based on MGSE and TAAC for video-based human action recognition

被引:5
|
作者
Zhou, Shuang [1 ]
Xu, Hongji [1 ]
Bai, Zhiquan [1 ]
Du, Zhengfeng [1 ]
Zeng, Jiaqi [1 ]
Wang, Yang [1 ]
Wang, Yuhao [1 ]
Li, Shijie [1 ]
Wang, Mengmeng [1 ]
Li, Yiran [1 ]
Li, Jianjun [1 ]
Xu, Jie [1 ]
机构
[1] Shandong Univ, Sch Informat Sci & Engn, 72 Binhai Rd, Qingdao 266237, Shandong, Peoples R China
关键词
Human action recognition; Multidimensional feature; Multiscale convolution;
D O I
10.1016/j.neunet.2023.09.031
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
With the maturity of intelligent technology such as human-computer interaction, human action recognition (HAR) technology has been widely used in virtual reality, video surveillance, and other fields. However, the current video-based HAR methods still cannot fully extract abstract action features, and there is still a lack of action collection and recognition for special personnel such as prisoners and elderly people living alone. To solve the above problems, this paper proposes a multidimensional feature fusion network, called P-MTSC3D, a parallel network based on context modeling and temporal adaptive attention module. It consists of three branches. The first branch serves as the basic network branch, which extracts basic feature information. The second branch consists of a feature pre-extraction layer and two multiscale-convolution-based global context modeling combined squeeze and excitation (MGSE) modules, which can extract spatial and channel features. The third branch consists of two temporal adaptive attention units based on convolution (TAAC) to extract temporal dimension features. In order to verify the validity of the proposed network, this paper conducts experiments on the University of Central Florida (UCF) 101 dataset and the human motion database (HMDB) 51 dataset. The recognition accuracy of the proposed P-MTSC3D network is 97.92% on the UCF101 dataset and 75.59% on the HMDB51 dataset, respectively. The FLOPs of the P-MTSC3D network is 30.85G, and the test time is 2.83 s/16 samples on the UCF101 dataset. The experimental results demonstrate that the P-MTSC3D network has better overall performance than the state-of-the-art networks. In addition, a prison action (PA) dataset is constructed in this paper to verify the application effect of the proposed network in actual scenarios.
引用
收藏
页码:496 / 507
页数:12
相关论文
共 50 条
  • [31] Complex Wavelet Feature Extraction for Video-based Face Recognition
    Zhang, Ping
    IEEE SOUTHEASTCON 2010: ENERGIZING OUR FUTURE, 2010, : 440 - 443
  • [32] Feature Subspace Determination in Video-based Mismatched Face Recognition
    Choi, Jae Young
    Ro, Yong Man
    Plataniotis, Konstantinos N.
    2008 8TH IEEE INTERNATIONAL CONFERENCE ON AUTOMATIC FACE & GESTURE RECOGNITION (FG 2008), VOLS 1 AND 2, 2008, : 158 - +
  • [33] Multiscale Temporal Network for Video-Based Gait Recognition
    Wu, Xinhui
    Yu, Shiqi
    Huang, Yongzhen
    BIOMETRIC RECOGNITION (CCBR 2019), 2019, 11818 : 75 - 83
  • [34] Video-based face recognition based on deep convolutional neural network
    Zhai, Yilong
    He, Dongzhi
    PROCEEDINGS OF 2019 INTERNATIONAL CONFERENCE ON IMAGE, VIDEO AND SIGNAL PROCESSING (IVSP 2019), 2019, : 23 - 27
  • [35] Few-Shot Action Recognition in Video Based on Multi-Feature Fusion
    Pu Z.-X.
    Ge Y.
    Jisuanji Xuebao/Chinese Journal of Computers, 2023, 46 (03): : 594 - 608
  • [36] A survey on video-based Human Action Recognition: recent updates, datasets, challenges, and applications
    Preksha Pareek
    Ankit Thakkar
    Artificial Intelligence Review, 2021, 54 : 2259 - 2322
  • [37] Self-Supervised Video-Based Action Recognition With Disturbances
    Lin, Wei
    Ding, Xinghao
    Huang, Yue
    Zeng, Huanqiang
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2023, 32 : 2493 - 2507
  • [38] A survey on video-based Human Action Recognition: recent updates, datasets, challenges, and applications
    Pareek, Preksha
    Thakkar, Ankit
    ARTIFICIAL INTELLIGENCE REVIEW, 2021, 54 (03) : 2259 - 2322
  • [39] Recent Advances in Video-Based Human Action Recognition using Deep Learning: A Review
    Wu, Di
    Sharma, Nabin
    Blumenstein, Michael
    2017 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2017, : 2865 - 2872
  • [40] HEROES: A Video-Based Human Emotion Recognition Database
    Mannocchi, Ilaria
    Lamichhane, Kamal
    Carli, Marco
    Battisti, Federica
    2022 10TH EUROPEAN WORKSHOP ON VISUAL INFORMATION PROCESSING (EUVIP), 2022,