Multimodal multilevel attention for semi-supervised skeleton-based gesture recognition

被引:0
|
作者
Liu, Jinting [1 ]
Gan, Minggang [1 ]
He, Yuxuan [1 ]
Guo, Jia [1 ]
Hu, Kang [1 ]
机构
[1] Beijing Inst Technol, Sch Automat, State Key Lab Intelligent Control & Decis Complex, Beijing, Peoples R China
关键词
Gesture recognition; Skeleton; Self-attention; Semi-supervised; Deep learning; NEURAL-NETWORKS; FUSION;
D O I
10.1007/s40747-025-01807-x
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Although skeleton-based gesture recognition using supervised learning has achieved promising results, the reliance on extensive annotated data poses significant costs. This paper addresses the challenge of semi-supervised skeleton-based gesture recognition, to effectively learn feature representations from labeled and unlabeled data. To resolve this problem, we propose a novel multimodal multilevel attention network designed for semi-supervised learning. This model utilizes the self-attention mechanism to polymerize multimodal and multilevel complementary semantic information of the hand skeleton, designing a multimodal multilevel contrastive loss to measure feature similarity. Specifically, our method explores the relationships between joint, bone, and motion to learn more discriminative feature representations. Considering the hierarchy of the hand skeleton, the skeleton data is divided into multilevel to capture complementary semantic information. Furthermore, the multimodal contrastive loss measures similarity among these multilevel representations. The proposed method demonstrates improved performance in semi-supervised skeleton-based gesture recognition tasks, as evidenced by experiments on the SHREC-17 and DHG 14/28 datasets.
引用
收藏
页数:16
相关论文
共 50 条
  • [1] Quantized depth image and skeleton-based multimodal dynamic hand gesture recognition
    Mahmud, Hasan
    Morshed, Mashrur M.
    Hasan, Md. Kamrul
    VISUAL COMPUTER, 2024, 40 (01): : 11 - 25
  • [2] X-Invariant Contrastive Augmentation and Representation Learning for Semi-Supervised Skeleton-Based Action Recognition
    Xu, Binqian
    Shu, Xiangbo
    Song, Yan
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2022, 31 : 3852 - 3867
  • [3] Quantized depth image and skeleton-based multimodal dynamic hand gesture recognition
    Hasan Mahmud
    Mashrur M. Morshed
    Md. Kamrul Hasan
    The Visual Computer, 2024, 40 : 11 - 25
  • [4] A Hand Gesture Recognition Model Based on Semi-supervised Learning
    Tao, Meiping
    Ma, Li
    2015 7TH INTERNATIONAL CONFERENCE ON INTELLIGENT HUMAN-MACHINE SYSTEMS AND CYBERNETICS IHMSC 2015, VOL II, 2015,
  • [5] Semi-Supervised Skeleton-Based Covert Cheating Detection in Electronic-Exams
    Atabay, Habibollah Agh
    Hassanpour, Hamid
    IRANIAN JOURNAL OF SCIENCE AND TECHNOLOGY-TRANSACTIONS OF ELECTRICAL ENGINEERING, 2024, 48 (04) : 1539 - 1551
  • [6] Skeleton-based Dynamic hand gesture recognition
    De Smedt, Quentin
    Wannous, Hazem
    Vandeborre, Jean-Philippe
    PROCEEDINGS OF 29TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, (CVPRW 2016), 2016, : 1206 - 1214
  • [7] Multi-Granularity Anchor-Contrastive Representation Learning for Semi-Supervised Skeleton-Based Action Recognition
    Shu, Xiangbo
    Xu, Binqian
    Zhang, Liyan
    Tang, Jinhui
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (06) : 7559 - 7576
  • [8] Semi-Supervised Learning for Surface EMG-based Gesture Recognition
    Du, Yu
    Wong, Yongkang
    Jin, Wenguang
    Wei, Wentao
    Hu, Yu
    Kankanhalli, Mohan
    Geng, Weidong
    PROCEEDINGS OF THE TWENTY-SIXTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2017, : 1624 - 1630
  • [9] HAN: An efficient hierarchical self-attention network for skeleton-based gesture recognition
    Liu, Jianbo
    Wang, Ying
    Xiang, Shiming
    Pan, Chunhong
    PATTERN RECOGNITION, 2025, 162
  • [10] A semi-supervised human action recognition algorithm based on skeleton feature
    Yuan, Hejin
    Journal of Information Hiding and Multimedia Signal Processing, 2015, 6 (01): : 175 - 182