Multimodal Fusion Framework Based on Statistical Attention and Contrastive Attention for Sign Language Recognition

被引:16
|
作者
Zhang, Jiangtao [1 ]
Wang, Qingshan [1 ]
Wang, Qi [1 ]
Zheng, Zhiwen [1 ]
机构
[1] Hefei Univ Technol, Sch Math, Hefei 230601, Anhui, Peoples R China
基金
中国国家自然科学基金;
关键词
Gesture recognition; Assistive technologies; Feature extraction; Skeleton; Hidden Markov models; Motion detection; Robot sensing systems; Sign language recognition; wearable computing; multimodal fusion; sEMG; deep learning; LAPLACIAN OPERATOR; FIELD; SHAPE;
D O I
10.1109/TMC.2023.3235935
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Sign language recognition (SLR) enables hearing-impaired people to better communicate with able-bodied individuals. The diversity of multiple modalities can be utilized to improve SLR. However, existing multimodal fusion methods do not take into account multimodal interrelationships in-depth. This paper proposes SeeSign: a multimodal fusion framework based on statistical attention and contrastive attention for SLR. The designed two attention mechanisms are used to investigate intra-modal and inter-modal correlations of surface Electromyography (sEMG) and inertial measurement unit (IMU) signals, and fuse the two modalities. Statistical attention uses the Laplace operator and lower quantile to select and enhance active features within each modal feature clip. Contrastive attention calculates the information gain of active features in a couple of enhanced feature clips located at the same position in two modalities. The enhanced feature clips are then fused in their positions based on the gain. The fused multimodal features are fed into a Transformer-based network with connectionist temporal classification and cross-entropy losses for SLR. The experimental results show that SeeSign has accuracy of 93.17% for isolated words, and word error rates of 18.34% and 22.08% on one-handed and two-handed sign language datasets, respectively. Moreover, it outperforms state-of-the-art methods in terms of accuracy and robustness.
引用
收藏
页码:1431 / 1443
页数:13
相关论文
共 50 条
  • [1] A multimodal framework for sensor based sign language recognition
    Kumar, Pradeep
    Gauba, Himaanshu
    Roy, Partha Pratim
    Dogra, Debi Prosad
    NEUROCOMPUTING, 2017, 259 : 21 - 38
  • [2] A Cross-Attention BERT-Based Framework for Continuous Sign Language Recognition
    Zhou, Zhenxing
    Tam, Vincent W.L.
    Lam, Edmund Y.
    IEEE Signal Processing Letters, 2022, 29 : 1818 - 1822
  • [3] A Cross-Attention BERT-Based Framework for Continuous Sign Language Recognition
    Zhou, Zhenxing
    Tam, Vincent W. L.
    Lam, Edmund Y.
    IEEE SIGNAL PROCESSING LETTERS, 2022, 29 : 1818 - 1822
  • [4] Multi-Modal Fusion Sign Language Recognition Based on Residual Network and Attention Mechanism
    Chu Chaoqin
    Xiao Qinkun
    Zhang Yinhuan
    Xing, Liu
    INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2022, 36 (12)
  • [5] Sign language recognition based on global-local attention
    Zhang, Shujun
    Zhang, Qun
    JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2021, 80
  • [6] Sign, Attend and Tell: Spatial Attention for Sign Language Recognition
    Sarhan, Noha
    Frintrop, Simone
    2021 16TH IEEE INTERNATIONAL CONFERENCE ON AUTOMATIC FACE AND GESTURE RECOGNITION (FG 2021), 2021,
  • [7] Holistic-Based Cross-Attention Modal Fusion Network for Video Sign Language Recognition
    Gao, Qing
    Hu, Jing
    Mai, Haixing
    Ju, Zhaojie
    IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS, 2024,
  • [8] Fusion of Attention-Based Convolution Neural Network and HOG Features for Static Sign Language Recognition
    Kumari, Diksha
    Anand, Radhey Shyam
    APPLIED SCIENCES-BASEL, 2023, 13 (21):
  • [9] A Multimodal Fusion Model Based on Hybrid Attention Mechanism for Gesture Recognition
    Li, Yajie
    Chen, Yiqiang
    Gu, Yang
    Ouyang, Jianquan
    STRUCTURAL, SYNTACTIC, AND STATISTICAL PATTERN RECOGNITION, S+SSPR 2020, 2021, 12644 : 302 - 312
  • [10] A multi-modal fusion framework for continuous sign language recognition based on multi-layer self-attention mechanism
    Xue, Cuihong
    Yu, Ming
    Yan, Gang
    Qin, Mengxian
    Liu, Yuehao
    Jia, Jingli
    JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2022, 43 (04) : 4303 - 4316