Human Action Recognition Based on 3D Convolution and Multi-Attention Transformer

被引:0
|
作者
Liu, Minghua [1 ]
Li, Wenjing [1 ]
He, Bo [1 ]
Wang, Chuanxu [1 ]
Qu, Lianen [1 ,2 ]
机构
[1] Qingdao Univ Sci & Technol, Coll Informat Sci & Technol, Qingdao 266101, Peoples R China
[2] Xinjiang Inst Engn, Coll Informat Engn, Urumqi 830023, Peoples R China
来源
APPLIED SCIENCES-BASEL | 2025年 / 15卷 / 05期
关键词
multi-attention; multi-scale; two-stream network; action recognition; transformer; C3D; NETWORK;
D O I
10.3390/app15052695
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
To address the limitations of traditional two-stream networks, such as inadequate spatiotemporal information fusion, limited feature diversity, and insufficient accuracy, we propose an improved two-stream network for human action recognition based on multi-scale attention Transformer and 3D convolutional (C3D) fusion. In the temporal stream, the traditional 2D convolutional is replaced with a C3D network to effectively capture temporal dynamics and spatial features. In the spatial stream, a multi-scale convolutional Transformer encoder is introduced to extract features. Leveraging the multi-scale attention mechanism, the model captures and enhances features at various scales, which are then adaptively fused using a weighted strategy to improve feature representation. Furthermore, through extensive experiments on feature fusion methods, the optimal fusion strategy for the two-stream network is identified. Experimental results on benchmark datasets such as UCF101 and HMDB51 demonstrate that the proposed model achieves superior performance in action recognition tasks.
引用
收藏
页数:16
相关论文
共 50 条
  • [21] 3D RANs: 3D Residual Attention Networks for action recognition
    Cai, Jiahui
    Hu, Jianguo
    VISUAL COMPUTER, 2020, 36 (06): : 1261 - 1270
  • [22] 3D RANs: 3D Residual Attention Networks for action recognition
    Jiahui Cai
    Jianguo Hu
    The Visual Computer, 2020, 36 : 1261 - 1270
  • [23] Global Spatio-Temporal Attention for Action Recognition Based on 3D Human Skeleton Data
    Han, Yun
    Chung, Sheng-Luen
    Xiao, Qiang
    Lin, Wei You
    Su, Shun-Feng
    IEEE ACCESS, 2020, 8 : 88604 - 88616
  • [24] Two Stream Multi-Attention Graph Convolutional Network for Skeleton-Based Action Recognition
    Zhou, Huijian
    Tian, Zhiqiang
    Du, Shaoyi
    ARTIFICIAL INTELLIGENCE AND ROBOTICS, ISAIR 2023, 2024, 1998 : 112 - 120
  • [25] Multi-Head Structural Attention-Based Vision Transformer with Sequential Views for 3D Object Recognition
    Bao, Jianjun
    Luo, Ke
    Kou, Qiqi
    He, Liang
    Zhao, Guo
    APPLIED SCIENCES-BASEL, 2025, 15 (06):
  • [26] SpATr: MoCap 3D human action recognition based on spiral auto-encoder and transformer network
    Bouzid, Hamza
    Ballihi, Lahoucine
    COMPUTER VISION AND IMAGE UNDERSTANDING, 2024, 241
  • [27] FACIAL EXPRESSION RECOGNITION ALGORITHM BASED ON MULTI-ATTENTION MECHANISM
    Wu, Huixin
    Huang, Zehuan
    Jiang, Wei
    Zhao, Xin
    INTERNATIONAL JOURNAL OF INNOVATIVE COMPUTING INFORMATION AND CONTROL, 2023, 19 (04): : 1239 - 1250
  • [28] Efficient Parallel Inflated 3D Convolution Architecture for Action Recognition
    Huang, Yukun
    Guo, Yongcai
    Gao, Chao
    IEEE ACCESS, 2020, 8 : 45753 - 45765
  • [29] Deformable Pose Traversal Convolution for 3D Action and Gesture Recognition
    Weng, Junwu
    Liu, Mengyuan
    Jiang, Xudong
    Yuan, Junsong
    COMPUTER VISION - ECCV 2018, PT VII, 2018, 11211 : 142 - 157
  • [30] 3D Deformable Convolution Temporal Reasoning network for action recognition
    Ou, Yangjun
    Chen, Zhenzhong
    JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2023, 93