Parallel Self-Attention and Spatial-Attention Fusion for Human Pose Estimation and Running Movement Recognition

被引:0
|
作者
Wu, Qingtian [1 ]
Zhang, Yu [2 ,3 ]
Zhang, Liming [1 ]
Yu, Haoyong [4 ]
机构
[1] Univ Macau, Fac Sci & Technol, Dept Comp & Informat Sci, Macau, Peoples R China
[2] Univ Macau, Fac Sci & Technol, Macau, Peoples R China
[3] Shenyang Univ Chem Technol, Comp Sci & Technol Coll, Shenyang 110142, Peoples R China
[4] Natl Univ Singapore, Dept Biomed Engn, Singapore 119077, Singapore
关键词
Transformers; Semantics; Pose estimation; Feature extraction; Convolutional neural networks; Task analysis; Visualization; Feature fusion; human pose estimation (HPE); running recognition; self-attention; spatial attention;
D O I
10.1109/TCDS.2023.3275652
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Human pose estimation (HPE) is a fundamental yet promising visual recognition problem. Existing popular methods (e.g., Hourglass and its variants) either attempt to directly add local features element-wisely, or (e.g., vision transformers) try to learn the global relationships among different human parts. However, it remains an open problem to effectively integrate the local-global representations for accurate HPE. In this work, we design four feature fusion strategies on the hierarchical ResNet structure, including direct channel concatenation, element-wise addition, and two parallel structures. Both two parallel structures adopt the naive self-attention encoder to model global dependencies. The difference between them is that one adopts the original ResNet BottleNeck while the other employs a spatial-attention module (named SSF) to learn the local patterns. Experiments on COCO Keypoint 2017 show that our SSF for HPE (named SSPose) achieves the best average precision with acceptable computational cost among the compared state-of-the-art methods. In addition, we build a lightweight running data set to verify the effectiveness of SSPose. Based solely on the keypoints estimated by our SSPose, we propose a regression model to identify valid running movements without training any other classifiers. Our source codes and running data set are publicly available.
引用
收藏
页码:358 / 368
页数:11
相关论文
共 50 条
  • [21] Fusion of Convolutional Self-Attention and Cross-Dimensional Feature Transformation for Human Posture Estimation
    Anzhan Liu
    Yilu Ding
    Xiangyang Lu
    [J]. JournalofBeijingInstituteofTechnology, 2024, 33 (04) : 346 - 360
  • [22] Self-Attention Parallel Fusion Network for Wind Turbine Gearboxes Fault Diagnosis
    Yang, Qichao
    Tang, Baoping
    Shen, Yizhe
    Li, Qikang
    [J]. IEEE SENSORS JOURNAL, 2023, 23 (19) : 23210 - 23220
  • [23] Arbitrary Style Transfer with Parallel Self-Attention
    Zhang, Tiange
    Gao, Ying
    Gao, Feng
    Qi, Lin
    Dong, Junyu
    [J]. 2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 1406 - 1413
  • [24] Action Transformer: A self-attention model for short-time pose-based human action recognition
    Mazzia, Vittorio
    Angarano, Simone
    Salvetti, Francesco
    Angelini, Federico
    Chiaberge, Marcello
    [J]. PATTERN RECOGNITION, 2022, 124
  • [25] MixSynthFormer: A Transformer Encoder-like Structure with Mixed Synthetic Self-attention for Efficient Human Pose Estimation
    Sun, Yuran
    Dougherty, Alan William
    Zhang, Zhuoying
    Choi, Yi King
    Wu, Chuan
    [J]. 2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 14838 - 14847
  • [26] CHANNEL-POSITION SELF-ATTENTION WITH QUERY REFINEMENT SKELETON GRAPH NEURAL NETWORK IN HUMAN POSE ESTIMATION
    Chu, Shek Wai
    Zhang, Chaoyi
    Song, Yang
    Cai, Weidong
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2022, : 971 - 975
  • [27] Multimodal fusion hierarchical self-attention network for dynamic hand gesture recognition
    Balaji, Pranav
    Prusty, Manas Ranjan
    [J]. JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2024, 98
  • [28] Music Emotion Recognition Fusion on CNN-BiLSTM and Self-Attention Model
    Zhong, Zhipeng
    Wang, Hailong
    Su, Guibin
    Liu, Lin
    Pei, Dongmei
    [J]. Computer Engineering and Applications, 2024, 59 (03) : 94 - 103
  • [29] Self-Attention Encoding and Pooling for Speaker Recognition
    Safari, Pooyan
    India, Miquel
    Hernando, Javier
    [J]. INTERSPEECH 2020, 2020, : 941 - 945
  • [30] Cyclic Self-attention for Point Cloud Recognition
    Zhu, Guanyu
    Zhou, Yong
    Yao, Rui
    Zhu, Hancheng
    Zhao, Jiaqi
    [J]. ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2023, 19 (01)