Human Action Recognition Based on 3D Convolution and Multi-Attention Transformer

被引:0
|
作者
Liu, Minghua [1 ]
Li, Wenjing [1 ]
He, Bo [1 ]
Wang, Chuanxu [1 ]
Qu, Lianen [1 ,2 ]
机构
[1] Qingdao Univ Sci & Technol, Coll Informat Sci & Technol, Qingdao 266101, Peoples R China
[2] Xinjiang Inst Engn, Coll Informat Engn, Urumqi 830023, Peoples R China
来源
APPLIED SCIENCES-BASEL | 2025年 / 15卷 / 05期
关键词
multi-attention; multi-scale; two-stream network; action recognition; transformer; C3D; NETWORK;
D O I
10.3390/app15052695
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
To address the limitations of traditional two-stream networks, such as inadequate spatiotemporal information fusion, limited feature diversity, and insufficient accuracy, we propose an improved two-stream network for human action recognition based on multi-scale attention Transformer and 3D convolutional (C3D) fusion. In the temporal stream, the traditional 2D convolutional is replaced with a C3D network to effectively capture temporal dynamics and spatial features. In the spatial stream, a multi-scale convolutional Transformer encoder is introduced to extract features. Leveraging the multi-scale attention mechanism, the model captures and enhances features at various scales, which are then adaptively fused using a weighted strategy to improve feature representation. Furthermore, through extensive experiments on feature fusion methods, the optimal fusion strategy for the two-stream network is identified. Experimental results on benchmark datasets such as UCF101 and HMDB51 demonstrate that the proposed model achieves superior performance in action recognition tasks.
引用
收藏
页数:16
相关论文
共 50 条
  • [31] Spatio-temporal attention on manifold space for 3D human action recognition
    Ding, Chongyang
    Liu, Kai
    Cheng, Fei
    Belyaev, Evgeny
    APPLIED INTELLIGENCE, 2021, 51 (01) : 560 - 570
  • [32] Weakly-supervised temporal attention 3D network for human action recognition
    Kim, Jonghyun
    Li, Gen
    Yun, Inyong
    Jung, Cheolkon
    Kim, Joongkyu
    PATTERN RECOGNITION, 2021, 119
  • [33] Spatio-temporal attention on manifold space for 3D human action recognition
    Chongyang Ding
    Kai Liu
    Fei Cheng
    Evgeny Belyaev
    Applied Intelligence, 2021, 51 : 560 - 570
  • [34] 3D CNN for Human Action Recognition
    Boualia, Sameh Neili
    Ben Amara, Najoua Essoukri
    2021 18TH INTERNATIONAL MULTI-CONFERENCE ON SYSTEMS, SIGNALS & DEVICES (SSD), 2021, : 276 - 282
  • [35] Multi-scale 3D Convolution Fusion Two-Stream Networks for Action Recognition
    Song L.
    Weng L.
    Wang L.
    Xia M.
    Weng, Liguo (liguoweng@hotmail.com), 2018, Institute of Computing Technology (30): : 2074 - 2083
  • [36] 3D convolution network and Siamese-attention mechanism for expression recognition
    Zhang, Yi-Feng
    Xia, Tian
    Liu, Yuan
    MULTIMEDIA TOOLS AND APPLICATIONS, 2019, 78 (21) : 30355 - 30371
  • [37] 3D convolution network and Siamese-attention mechanism for expression recognition
    Yi-Feng Zhang
    Tian Xia
    Yuan Liu
    Multimedia Tools and Applications, 2019, 78 : 30355 - 30371
  • [38] Human Action Recognition Based on 3D Human Modeling and Cyclic HMMs
    Ke, Shian-Ru
    Hoang Le Uyen Thuc
    Hwang, Jenq-Neng
    Yoo, Jang-Hee
    Choi, Kyoung-Ho
    ETRI JOURNAL, 2014, 36 (04) : 661 - 671
  • [39] PointFaceFormer: local and global attention based transformer for 3D point cloud face recognition
    Gao, Ziqi
    Li, Qiufu
    Wang, Gui
    Shen, Linlin
    2024 IEEE 18TH INTERNATIONAL CONFERENCE ON AUTOMATIC FACE AND GESTURE RECOGNITION, FG 2024, 2024,
  • [40] Advancing human action recognition: A hybrid approach using attention-based LSTM and 3D CNN
    Saoudi, El Mehdi
    Jaafari, Jaafar
    Andaloussi, Said Jai
    SCIENTIFIC AFRICAN, 2023, 21