Human Action Recognition Based on 3D Convolution and Multi-Attention Transformer

被引:0
|
作者
Liu, Minghua [1 ]
Li, Wenjing [1 ]
He, Bo [1 ]
Wang, Chuanxu [1 ]
Qu, Lianen [1 ,2 ]
机构
[1] Qingdao Univ Sci & Technol, Coll Informat Sci & Technol, Qingdao 266101, Peoples R China
[2] Xinjiang Inst Engn, Coll Informat Engn, Urumqi 830023, Peoples R China
来源
APPLIED SCIENCES-BASEL | 2025年 / 15卷 / 05期
关键词
multi-attention; multi-scale; two-stream network; action recognition; transformer; C3D; NETWORK;
D O I
10.3390/app15052695
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
To address the limitations of traditional two-stream networks, such as inadequate spatiotemporal information fusion, limited feature diversity, and insufficient accuracy, we propose an improved two-stream network for human action recognition based on multi-scale attention Transformer and 3D convolutional (C3D) fusion. In the temporal stream, the traditional 2D convolutional is replaced with a C3D network to effectively capture temporal dynamics and spatial features. In the spatial stream, a multi-scale convolutional Transformer encoder is introduced to extract features. Leveraging the multi-scale attention mechanism, the model captures and enhances features at various scales, which are then adaptively fused using a weighted strategy to improve feature representation. Furthermore, through extensive experiments on feature fusion methods, the optimal fusion strategy for the two-stream network is identified. Experimental results on benchmark datasets such as UCF101 and HMDB51 demonstrate that the proposed model achieves superior performance in action recognition tasks.
引用
收藏
页数:16
相关论文
共 50 条
  • [41] Transformer-based multi-attention hybrid networks for skin lesion segmentation
    Dong, Zhiwei
    Li, Jinjiang
    Hua, Zhen
    EXPERT SYSTEMS WITH APPLICATIONS, 2024, 244
  • [42] Human Action Recognition Based on Quaternion 3D Skeleton Representation
    Xu Haiyang
    Kong Jun
    Jiang Min
    LASER & OPTOELECTRONICS PROGRESS, 2018, 55 (02)
  • [43] Learning 3D Skeletal Representation From Transformer for Action Recognition
    Cha, Junuk
    Saqlain, Muhammad
    Kim, Donguk
    Lee, Seungeun
    Lee, Seongyeong
    Baek, Seungryul
    IEEE ACCESS, 2022, 10 : 67541 - 67550
  • [44] A novel multi-attention, multi-scale 3D deep network for coronary artery segmentation
    Dong, Caixia
    Xu, Songhua
    Dai, Duwei
    Zhang, Yizhi
    Zhang, Chunyan
    Li, Zongfang
    MEDICAL IMAGE ANALYSIS, 2023, 85
  • [45] DiffCAS: diffusion based multi-attention network for segmentation of 3D coronary artery from CT angiography
    Li, Jiajia
    Wu, Qing
    Wang, Yuanquan
    Zhou, Shoujun
    Zhang, Lei
    Wei, Jin
    Zhao, Di
    SIGNAL IMAGE AND VIDEO PROCESSING, 2024, 18 (10) : 7487 - 7498
  • [46] DiffCAS: diffusion based multi-attention network for segmentation of 3D coronary artery from CT angiography
    Li, Jiajia
    Wu, Qing
    Wang, Yuanquan
    Zhou, Shoujun
    Zhang, Lei
    Wei, Jin
    Zhao, Di
    SIGNAL IMAGE AND VIDEO PROCESSING, 2024,
  • [47] Multi-Attention Fusion Network for Video-based Emotion Recognition
    Wang, Yanan
    Wu, Jianming
    Hoashi, Keiichiro
    ICMI'19: PROCEEDINGS OF THE 2019 INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION, 2019, : 595 - 601
  • [48] ACTION RECOGNITION BASED ON MULTI-LEVEL REPRESENTATION OF 3D SHAPE
    Nair, Binu M.
    Asari, Vijayan K.
    VISAPP 2011: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON COMPUTER VISION THEORY AND APPLICATIONS, 2011, : 378 - 386
  • [49] Multi-cue based 3D residual network for action recognition
    Ming Zong
    Ruili Wang
    Zhe Chen
    Maoli Wang
    Xun Wang
    Johan Potgieter
    Neural Computing and Applications, 2021, 33 : 5167 - 5181
  • [50] Multi-cue based 3D residual network for action recognition
    Zong, Ming
    Wang, Ruili
    Chen, Zhe
    Wang, Maoli
    Wang, Xun
    Potgieter, Johan
    NEURAL COMPUTING & APPLICATIONS, 2021, 33 (10): : 5167 - 5181