Skeleton-weighted and multi-scale temporal-driven network for video action recognition

被引:0
|
作者
Xu, Ziqi [1 ]
Zhang, Jie [2 ,3 ]
Zhang, Peng [2 ,3 ]
Ding, Pengfei [4 ]
机构
[1] Donghua Univ, Coll Comp Sci & Technol, Shanghai, Peoples R China
[2] Minist Educ, Engn Res Ctr Digitalized Textile & Fash Technol, Shanghai, Peoples R China
[3] Donghua Univ, Shanghai Engn Res Ctr Ind Big Data & Intelligent, Inst Artificial Intelligence, Shanghai, Peoples R China
[4] Donghua Univ, Coll Mech Engn, Shanghai, Peoples R China
关键词
video action recognition; multi-model; feature extraction; temporal modeling; feature fusion; RGB;
D O I
10.1117/1.JEI.33.6.063056
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Sequential and causal relationships among actions are critical for accurate video interpretation. Therefore, capturing both short-term and long-term temporal information is essential for effective action recognition. Current research, however, primarily focuses on fusing spatial features from diverse modalities for short-term action recognition, inadequately modeling the complex temporal dependencies in videos, leading to suboptimal performance. To address this limitation, we propose a skeleton-weighted and multi-scale temporal-driven action recognition network that integrates RGB and skeleton modalities to effectively capture both short-term and long-term temporal information. First, we propose a temporal-enhanced adaptive graph convolutional network. This network derives motion attention masks from the skeletal joints and transfers them to RGB videos to generate visually salient regions, thereby achieving a concise and effective input representation. Subsequently, we develop a multi-scale local-global temporal modeling network driven by a self-attention mechanism, which effectively captures fine-grained local details of individual actions along with global temporal relationships among actions across multiple temporal resolutions. Moreover, we design a multi-level adaptive temporal scale mixer module that efficiently integrates multi-scale features, creating a unified temporal feature representation to ensure temporal consistency. Finally, we conducted extensive experiments on the NTU-RGBD-60, NTU-RGBD-120, NW-UCLA, and Kinetics datasets to validate the effectiveness of the proposed method. (c) 2024 SPIE and IS&T
引用
收藏
页数:23
相关论文
共 50 条
  • [21] Multi-Scale Adaptive Aggregate Graph Convolutional Network for Skeleton-Based Action Recognition
    Zheng, Zhiyun
    Wang, Yizhou
    Zhang, Xingjin
    Wang, Junfeng
    APPLIED SCIENCES-BASEL, 2022, 12 (03):
  • [22] Adaptive Multi-Scale Difference Graph Convolution Network for Skeleton-Based Action Recognition
    Wang, Xiaojuan
    Gan, Ziliang
    Jin, Lei
    Xiao, Yabo
    He, Mingshu
    ELECTRONICS, 2023, 12 (13)
  • [23] One-Shot Action Recognition via Multi-Scale Spatial-Temporal Skeleton Matching
    Yang, Siyuan
    Liu, Jun
    Lu, Shijian
    Hwa, Er Meng
    Kot, Alex C.
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2024, 46 (07) : 5149 - 5156
  • [24] Semantic-guided multi-scale human skeleton action recognition
    Yongfeng Qi
    Jinlin Hu
    Liqiang Zhuang
    Xiaoxu Pei
    Applied Intelligence, 2023, 53 : 9763 - 9778
  • [25] Semantic-guided multi-scale human skeleton action recognition
    Qi, Yongfeng
    Hu, Jinlin
    Zhuang, Liqiang
    Pei, Xiaoxu
    APPLIED INTELLIGENCE, 2023, 53 (09) : 9763 - 9778
  • [26] Multi-scale Dynamic Network for Temporal Action Detection
    Ren, Yifan
    Xu, Xing
    Shen, Fumin
    Wang, Zheng
    Yang, Yang
    Shen, Heng Tao
    PROCEEDINGS OF THE 2021 INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL (ICMR '21), 2021, : 267 - 275
  • [27] Multi-scale aggregation network for temporal action proposals
    Wang, Zheng
    Chen, Kai
    Zhang, Mingxing
    He, Peilin
    Wang, Yajie
    Zhu, Ping
    Yang, Yang
    PATTERN RECOGNITION LETTERS, 2019, 122 : 60 - 65
  • [28] Multi-scale temporal feature-based dense convolutional network for action recognition
    Li, Xiaoqiang
    Xie, Miao
    Zhang, Yin
    Li, Jide
    JOURNAL OF ELECTRONIC IMAGING, 2020, 29 (06)
  • [29] Lighter and faster: A multi-scale adaptive graph convolutional network for skeleton-based action recognition
    Jiang, Yuanjian
    Deng, Hongmin
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2024, 132
  • [30] Hierarchical adaptive multi-scale hypergraph attention convolution network for skeleton-based action recognition
    Yang, Honghong
    Wang, Sai
    Jiang, Lu
    Su, Yuping
    Zhang, Yumei
    APPLIED SOFT COMPUTING, 2025, 172