Human Action Recognition Based on 3D Convolution and Multi-Attention Transformer

被引:0
|
作者
Liu, Minghua [1 ]
Li, Wenjing [1 ]
He, Bo [1 ]
Wang, Chuanxu [1 ]
Qu, Lianen [1 ,2 ]
机构
[1] Qingdao Univ Sci & Technol, Coll Informat Sci & Technol, Qingdao 266101, Peoples R China
[2] Xinjiang Inst Engn, Coll Informat Engn, Urumqi 830023, Peoples R China
来源
APPLIED SCIENCES-BASEL | 2025年 / 15卷 / 05期
关键词
multi-attention; multi-scale; two-stream network; action recognition; transformer; C3D; NETWORK;
D O I
10.3390/app15052695
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
To address the limitations of traditional two-stream networks, such as inadequate spatiotemporal information fusion, limited feature diversity, and insufficient accuracy, we propose an improved two-stream network for human action recognition based on multi-scale attention Transformer and 3D convolutional (C3D) fusion. In the temporal stream, the traditional 2D convolutional is replaced with a C3D network to effectively capture temporal dynamics and spatial features. In the spatial stream, a multi-scale convolutional Transformer encoder is introduced to extract features. Leveraging the multi-scale attention mechanism, the model captures and enhances features at various scales, which are then adaptively fused using a weighted strategy to improve feature representation. Furthermore, through extensive experiments on feature fusion methods, the optimal fusion strategy for the two-stream network is identified. Experimental results on benchmark datasets such as UCF101 and HMDB51 demonstrate that the proposed model achieves superior performance in action recognition tasks.
引用
收藏
页数:16
相关论文
共 50 条
  • [1] Human Action Recognition Method Based on Multi-Attention Mechanism and Spatiotemporal Graph Convolution Networks
    Li, Xuanye
    Hao, Xingwei
    Jia, Jingong
    Zhou, Yuanfeng
    Jisuanji Fuzhu Sheji Yu Tuxingxue Xuebao/Journal of Computer-Aided Design and Computer Graphics, 2021, 33 (07): : 1055 - 1063
  • [2] Action Recognition Model Based on 3D Graph Convolution and Attention Enhanced
    Cao Yi
    Liu Chen
    Sheng Yongjian
    Huang Zilong
    Deng Xiaolong
    JOURNAL OF ELECTRONICS & INFORMATION TECHNOLOGY, 2021, 43 (07) : 2071 - 2078
  • [3] Multi-stream adaptive 3D attention graph convolution network for skeleton-based action recognition
    Yu, Lubin
    Tian, Lianfang
    Du, Qiliang
    Bhutto, Jameel Ahmed
    APPLIED INTELLIGENCE, 2023, 53 (12) : 14838 - 14854
  • [4] Multi-stream adaptive 3D attention graph convolution network for skeleton-based action recognition
    Lubin Yu
    Lianfang Tian
    Qiliang Du
    Jameel Ahmed Bhutto
    Applied Intelligence, 2023, 53 : 14838 - 14854
  • [5] 3D Object Detection with LiDAR Based on Multi-Attention Mechanism
    Cao, Jie
    Peng, Yiqiang
    Fan, Likang
    Mo, Lingfan
    Wang, Longfei
    LASER & OPTOELECTRONICS PROGRESS, 2025, 62 (04)
  • [6] A review of video action recognition based on 3D convolution
    Huang, Xiankai
    Cai, Zhibin
    COMPUTERS & ELECTRICAL ENGINEERING, 2023, 108
  • [7] Spatiotemporal decoupling attention transformer for 3D skeleton-based driver action recognition
    Xu, Zhuoyan
    Xu, Jingke
    COMPLEX & INTELLIGENT SYSTEMS, 2025, 11 (04)
  • [8] Human gaze prediction for 3D light field display based on multi-attention fusion network
    Zhao, Meng
    Yan, Binbin
    Chen, Shuo
    Guo, Xiao
    Li, Ningchi
    Chen, Duo
    Wang, Kuiru
    Sang, Xinzhu
    OPTICS COMMUNICATIONS, 2024, 560
  • [9] Multi-Task Multi-Attention Transformer for Generative Named Entity Recognition
    Mo, Ying
    Liu, Jiahao
    Tang, Hongyin
    Wang, Qifan
    Xu, Zenglin
    Wang, Jingang
    Quan, Xiaojun
    Wu, Wei
    Li, Zhoujun
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 4171 - 4183
  • [10] Human Action Recognition based on 3D Convolution Neural Networks from RGBD Videos
    Al-Akam, Rawya
    Paulus, Dietrich
    Gharabaghi, Darius
    26. INTERNATIONAL CONFERENCE IN CENTRAL EUROPE ON COMPUTER GRAPHICS, VISUALIZATION AND COMPUTER VISION (WSCG 2018), 2018, 2803 : 18 - 26