Skeleton-weighted and multi-scale temporal-driven network for video action recognition

被引:0
|
作者
Xu, Ziqi [1 ]
Zhang, Jie [2 ,3 ]
Zhang, Peng [2 ,3 ]
Ding, Pengfei [4 ]
机构
[1] Donghua Univ, Coll Comp Sci & Technol, Shanghai, Peoples R China
[2] Minist Educ, Engn Res Ctr Digitalized Textile & Fash Technol, Shanghai, Peoples R China
[3] Donghua Univ, Shanghai Engn Res Ctr Ind Big Data & Intelligent, Inst Artificial Intelligence, Shanghai, Peoples R China
[4] Donghua Univ, Coll Mech Engn, Shanghai, Peoples R China
关键词
video action recognition; multi-model; feature extraction; temporal modeling; feature fusion; RGB;
D O I
10.1117/1.JEI.33.6.063056
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Sequential and causal relationships among actions are critical for accurate video interpretation. Therefore, capturing both short-term and long-term temporal information is essential for effective action recognition. Current research, however, primarily focuses on fusing spatial features from diverse modalities for short-term action recognition, inadequately modeling the complex temporal dependencies in videos, leading to suboptimal performance. To address this limitation, we propose a skeleton-weighted and multi-scale temporal-driven action recognition network that integrates RGB and skeleton modalities to effectively capture both short-term and long-term temporal information. First, we propose a temporal-enhanced adaptive graph convolutional network. This network derives motion attention masks from the skeletal joints and transfers them to RGB videos to generate visually salient regions, thereby achieving a concise and effective input representation. Subsequently, we develop a multi-scale local-global temporal modeling network driven by a self-attention mechanism, which effectively captures fine-grained local details of individual actions along with global temporal relationships among actions across multiple temporal resolutions. Moreover, we design a multi-level adaptive temporal scale mixer module that efficiently integrates multi-scale features, creating a unified temporal feature representation to ensure temporal consistency. Finally, we conducted extensive experiments on the NTU-RGBD-60, NTU-RGBD-120, NW-UCLA, and Kinetics datasets to validate the effectiveness of the proposed method. (c) 2024 SPIE and IS&T
引用
收藏
页数:23
相关论文
共 50 条
  • [41] Multi-Scale Proposal Regression Network for Temporal Action Proposal Generation
    Zheng, Jingye
    Chen, Dihu
    Hu, Haifeng
    IEEE ACCESS, 2019, 7 : 183860 - 183868
  • [42] MCMNET: Multi-Scale Context Modeling Network for Temporal Action Detection
    Zhang, Haiping
    Zhou, Fuxing
    Ma, Conghao
    Wang, Dongjing
    Zhang, Wanjun
    SENSORS, 2023, 23 (17)
  • [43] MSA-GCN: Exploiting Multi-Scale Temporal Dynamics With Adaptive Graph Convolution for Skeleton-Based Action Recognition
    Alowonou, Kowovi Comivi
    Han, Ji-Hyeong
    IEEE ACCESS, 2024, 12 : 193552 - 193563
  • [44] MULTI-SCALE TEMPORAL FEATURE FUSION FOR FEW-SHOT ACTION RECOGNITION
    Lee, Jun-Tae
    Yun, Sungrack
    2023 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2023, : 1785 - 1789
  • [45] Multi-Scale Spatio-Temporal Memory Network for Lightweight Video Denoising
    Sun, Lu
    Wu, Fangfang
    Ding, Wei
    Li, Xin
    Lin, Jie
    Dong, Weisheng
    Shi, Guangming
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2024, 33 : 5810 - 5823
  • [46] Human action recognition in immersive virtual reality based on multi-scale spatio-temporal attention network
    Xiao, Zhiyong
    Chen, Yukun
    Zhou, Xinlei
    He, Mingwei
    Liu, Li
    Yu, Feng
    Jiang, Minghua
    COMPUTER ANIMATION AND VIRTUAL WORLDS, 2024, 35 (05)
  • [47] Temporal refinement network: Combining dynamic convolution and multi-scale information for fine-grained action recognition
    Di, Jirui
    Hu, Zhengping
    Bi, Shuai
    Zhang, Hehao
    Wang, Yulu
    Sun, Zhe
    IMAGE AND VISION COMPUTING, 2024, 147
  • [48] Multi-scale sampling attention graph convolutional networks for skeleton-based action recognition
    Tian, Haoyu
    Zhang, Yipeng
    Wu, Hanbo
    Ma, Xin
    Li, Yibin
    NEUROCOMPUTING, 2024, 597
  • [49] Channel attention and multi-scale graph neural networks for skeleton-based action recognition
    Dang, Ronghao
    Liu, Chengju
    Liu, Ming
    Chen, Qijun
    AI COMMUNICATIONS, 2022, 35 (03) : 187 - 205
  • [50] 3D skeleton based action recognition by video-domain translation-scale invariant mapping and multi-scale dilated CNN
    Li, Bo
    He, Mingyi
    Dai, Yuchao
    Cheng, Xuelian
    Chen, Yucheng
    MULTIMEDIA TOOLS AND APPLICATIONS, 2018, 77 (17) : 22901 - 22921