Multimodal Fusion Framework Based on Statistical Attention and Contrastive Attention for Sign Language Recognition

被引：16

作者：

Zhang, Jiangtao ^{[1
]}

Wang, Qingshan ^{[1
]}

Wang, Qi ^{[1
]}

Zheng, Zhiwen ^{[1
]}

机构：

[1] Hefei Univ Technol, Sch Math, Hefei 230601, Anhui, Peoples R China

来源：

IEEE TRANSACTIONS ON MOBILE COMPUTING | 2024年 / 23卷 / 02期

基金：

中国国家自然科学基金;

关键词：

Gesture recognition; Assistive technologies; Feature extraction; Skeleton; Hidden Markov models; Motion detection; Robot sensing systems; Sign language recognition; wearable computing; multimodal fusion; sEMG; deep learning; LAPLACIAN OPERATOR; FIELD; SHAPE;

D O I：

10.1109/TMC.2023.3235935

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Sign language recognition (SLR) enables hearing-impaired people to better communicate with able-bodied individuals. The diversity of multiple modalities can be utilized to improve SLR. However, existing multimodal fusion methods do not take into account multimodal interrelationships in-depth. This paper proposes SeeSign: a multimodal fusion framework based on statistical attention and contrastive attention for SLR. The designed two attention mechanisms are used to investigate intra-modal and inter-modal correlations of surface Electromyography (sEMG) and inertial measurement unit (IMU) signals, and fuse the two modalities. Statistical attention uses the Laplace operator and lower quantile to select and enhance active features within each modal feature clip. Contrastive attention calculates the information gain of active features in a couple of enhanced feature clips located at the same position in two modalities. The enhanced feature clips are then fused in their positions based on the gain. The fused multimodal features are fed into a Transformer-based network with connectionist temporal classification and cross-entropy losses for SLR. The experimental results show that SeeSign has accuracy of 93.17% for isolated words, and word error rates of 18.34% and 22.08% on one-handed and two-handed sign language datasets, respectively. Moreover, it outperforms state-of-the-art methods in terms of accuracy and robustness.

引用

页码：1431 / 1443

页数：13

共 50 条

[1] A multimodal framework for sensor based sign language recognition
Kumar, Pradeep
Gauba, Himaanshu
Roy, Partha Pratim
Dogra, Debi Prosad
NEUROCOMPUTING, 2017, 259 : 21 - 38
[2] A Cross-Attention BERT-Based Framework for Continuous Sign Language Recognition
Zhou, Zhenxing
Tam, Vincent W.L.
Lam, Edmund Y.
IEEE Signal Processing Letters, 2022, 29 : 1818 - 1822
[3] A Cross-Attention BERT-Based Framework for Continuous Sign Language Recognition
Zhou, Zhenxing
Tam, Vincent W. L.
Lam, Edmund Y.
IEEE SIGNAL PROCESSING LETTERS, 2022, 29 : 1818 - 1822
[4] Multi-Modal Fusion Sign Language Recognition Based on Residual Network and Attention Mechanism
Chu Chaoqin
Xiao Qinkun
Zhang Yinhuan
Xing, Liu
INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2022, 36 (12)
[5] Sign language recognition based on global-local attention
Zhang, Shujun
Zhang, Qun
JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2021, 80
[6] Sign, Attend and Tell: Spatial Attention for Sign Language Recognition
Sarhan, Noha
Frintrop, Simone
2021 16TH IEEE INTERNATIONAL CONFERENCE ON AUTOMATIC FACE AND GESTURE RECOGNITION (FG 2021), 2021,
[7] Holistic-Based Cross-Attention Modal Fusion Network for Video Sign Language Recognition
Gao, Qing
Hu, Jing
Mai, Haixing
Ju, Zhaojie
IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS, 2024,
[8] Fusion of Attention-Based Convolution Neural Network and HOG Features for Static Sign Language Recognition
Kumari, Diksha
Anand, Radhey Shyam
APPLIED SCIENCES-BASEL, 2023, 13 (21):
[9] A Multimodal Fusion Model Based on Hybrid Attention Mechanism for Gesture Recognition
Li, Yajie
Chen, Yiqiang
Gu, Yang
Ouyang, Jianquan
STRUCTURAL, SYNTACTIC, AND STATISTICAL PATTERN RECOGNITION, S+SSPR 2020, 2021, 12644 : 302 - 312
[10] A multi-modal fusion framework for continuous sign language recognition based on multi-layer self-attention mechanism
Xue, Cuihong
Yu, Ming
Yan, Gang
Qin, Mengxian
Liu, Yuehao
Jia, Jingli
JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2022, 43 (04) : 4303 - 4316

← 1 2 3 4 5 →