A multidimensional feature fusion network based on MGSE and TAAC for video-based human action recognition

被引:5
|
作者
Zhou, Shuang [1 ]
Xu, Hongji [1 ]
Bai, Zhiquan [1 ]
Du, Zhengfeng [1 ]
Zeng, Jiaqi [1 ]
Wang, Yang [1 ]
Wang, Yuhao [1 ]
Li, Shijie [1 ]
Wang, Mengmeng [1 ]
Li, Yiran [1 ]
Li, Jianjun [1 ]
Xu, Jie [1 ]
机构
[1] Shandong Univ, Sch Informat Sci & Engn, 72 Binhai Rd, Qingdao 266237, Shandong, Peoples R China
关键词
Human action recognition; Multidimensional feature; Multiscale convolution;
D O I
10.1016/j.neunet.2023.09.031
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
With the maturity of intelligent technology such as human-computer interaction, human action recognition (HAR) technology has been widely used in virtual reality, video surveillance, and other fields. However, the current video-based HAR methods still cannot fully extract abstract action features, and there is still a lack of action collection and recognition for special personnel such as prisoners and elderly people living alone. To solve the above problems, this paper proposes a multidimensional feature fusion network, called P-MTSC3D, a parallel network based on context modeling and temporal adaptive attention module. It consists of three branches. The first branch serves as the basic network branch, which extracts basic feature information. The second branch consists of a feature pre-extraction layer and two multiscale-convolution-based global context modeling combined squeeze and excitation (MGSE) modules, which can extract spatial and channel features. The third branch consists of two temporal adaptive attention units based on convolution (TAAC) to extract temporal dimension features. In order to verify the validity of the proposed network, this paper conducts experiments on the University of Central Florida (UCF) 101 dataset and the human motion database (HMDB) 51 dataset. The recognition accuracy of the proposed P-MTSC3D network is 97.92% on the UCF101 dataset and 75.59% on the HMDB51 dataset, respectively. The FLOPs of the P-MTSC3D network is 30.85G, and the test time is 2.83 s/16 samples on the UCF101 dataset. The experimental results demonstrate that the P-MTSC3D network has better overall performance than the state-of-the-art networks. In addition, a prison action (PA) dataset is constructed in this paper to verify the application effect of the proposed network in actual scenarios.
引用
收藏
页码:496 / 507
页数:12
相关论文
共 50 条
  • [21] An Interpretable Deep Learning-Based Feature Reduction in Video-Based Human Activity Recognition
    Dutt, Micheal
    Goodwin, Morten
    Omlin, Christian W.
    IEEE ACCESS, 2024, 12 : 187947 - 187963
  • [22] Video-based bird posture recognition using dual feature-rates deep fusion convolutional neural network
    Lin, Chih-Wei
    Chen, Zhongsheng
    Lin, Mengxiang
    ECOLOGICAL INDICATORS, 2022, 141
  • [23] A deep learning method for video-based action recognition
    Zhang, Guanwen
    Rao, Yukun
    Wang, Changhao
    Zhou, Wei
    Ji, Xiangyang
    IET IMAGE PROCESSING, 2021, 15 (14) : 3498 - 3511
  • [24] A Review on Video-Based Human Activity Recognition
    Ke, Shian-Ru
    Hoang Le Uyen Thuc
    Lee, Yong-Jin
    Hwang, Jenq-Neng
    Yoo, Jang-Hee
    Choi, Kyoung-Ho
    COMPUTERS, 2013, 2 (02) : 88 - 131
  • [25] FlowerAction: a federated deep learning framework for video-based human action recognition
    Thi Quynh Khanh Dinh
    Thanh-Hai Tran
    Trung-Kien Tran
    Thi-Lan Le
    Journal of Ambient Intelligence and Humanized Computing, 2025, 16 (2) : 459 - 470
  • [26] Infinite Gaussian Fisher Vector to Support Video-Based Human Action Recognition
    Fernandez-Ramirez, Jorge L.
    Alvarez-Meza, Andres M.
    Orozco-Gutierrez, Alvaro A.
    David Echeverry-Correa, Julian
    ADVANCES IN VISUAL COMPUTING, ISVC 2019, PT II, 2019, 11845 : 38 - 49
  • [27] Video-Based Lifting Action Recognition Using Rank-Altered Kinematic Feature Pairs
    Jung, Sehee
    Su, Bingyi
    Lu, Lu
    Qing, Liwei
    Xu, Xu
    HUMAN FACTORS, 2024,
  • [28] Human Action Recognition Based on Feature Level Fusion and Random Projection
    Wang, Miao
    Sun, Jifeng
    Yu, Jialin
    PROCEEDINGS OF 2016 5TH INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND NETWORK TECHNOLOGY (ICCSNT), 2016, : 767 - 770
  • [29] Human Action Recognition Based On Multi-level Feature Fusion
    Xu, Y. Y.
    Xiao, G. Q.
    Tang, X. Q.
    PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON COMPUTER INFORMATION SYSTEMS AND INDUSTRIAL APPLICATIONS (CISIA 2015), 2015, 18 : 353 - 355
  • [30] Video-based Skeletal Feature Extraction for Hand Gesture Recognition
    Lim, Kim Chwee
    Sin, Swee Heng
    Lee, Chien Wei
    Chin, Weng Khin
    Lin, Junliang
    Nguyen, Khang
    Nguyen, Quang H.
    Nguyen, Binh P.
    Chua, Matthew
    ICMLSC 2020: PROCEEDINGS OF THE 4TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND SOFT COMPUTING, 2020, : 108 - 112