Weighted Multi-modal Sign Language Recognition

被引:0
|
作者
Liu, Edmond [1 ]
Lim, Jong Yoon [1 ]
MacDonald, Bruce [1 ]
Ahn, Ho Seok [1 ]
机构
[1] Univ Auckland, Dept Elect Comp & Software Engn, Fac Engn, Auckland, New Zealand
关键词
D O I
10.1109/RO-MAN60168.2024.10731214
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Multiple modalities can boost accuracy in the difficult task of Sign Language Recognition (SLR), however, each modality does not necessarily contribute the same quality of information. Current multi-modal approaches assign the same importance weightings to each modality, or set weightings based on unproven heuristics. This paper takes a systematic approach to find the optimal weights by performing grid search. Firstly, we create a multi-modal version of the RGB only WLASL100 data with additional hand crop and skeletal pose modalities. Secondly, we create a 3D CNN based weighted multi-modal sign language network (WMSLRnet). Finally, we run various grid searches to find the optimal weightings for each modality. We show that very minor adjustments in the weightings can have major effects on the final SLR accuracy. On WLASL100, we significantly outperform previous networks of similar design, and achieve high accuracy in SLR without highly complex pre-training schemes or extra data.
引用
收藏
页码:880 / 885
页数:6
相关论文
共 50 条
  • [41] ModDrop: Adaptive Multi-Modal Gesture Recognition
    Neverova, Natalia
    Wolf, Christian
    Taylor, Graham
    Nebout, Florian
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2016, 38 (08) : 1692 - 1706
  • [42] MULTI-MODAL EAR AND FACE MODELING AND RECOGNITION
    Mahoor, Mohammad H.
    Cadavid, Steven
    Abdel-Mottaleb, Mohamed
    2009 16TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, VOLS 1-6, 2009, : 4137 - +
  • [43] Towards automation in using multi-modal language resources: compatibility and interoperability for multi-modal features in Kachako
    Kano, Yoshinobu
    LREC 2012 - EIGHTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2012, : 1098 - 1101
  • [44] DualSign: Semi-Supervised Sign Language Production with Balanced Multi-Modal Multi-Task Dual Transformation
    Huang, Wencan
    Zhao, Zhou
    He, Jinzheng
    Zhang, Mingmin
    PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 5486 - 5495
  • [45] Lightweight multi-modal emotion recognition model based on modal generation
    Liu, Peisong
    Che, Manqiang
    Luo, Jiangchuan
    2022 9TH INTERNATIONAL FORUM ON ELECTRICAL ENGINEERING AND AUTOMATION, IFEEA, 2022, : 430 - 435
  • [46] Cross-modal dynamic convolution for multi-modal emotion recognition
    Wen, Huanglu
    You, Shaodi
    Fu, Ying
    JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2021, 78
  • [47] A Weighted Deep Ensemble for Indian Sign Language Recognition
    Gupta, Rinki
    Bhatnagar, Ananya Shekhar
    Singh, Ghanapriya
    IETE JOURNAL OF RESEARCH, 2024, 70 (03) : 2493 - 2500
  • [48] Multi-modal Video Action Recognition Method Based on Language-visual Contrastive Learning
    Zhang Y.
    Zhang B.-B.
    Dong W.
    An F.-M.
    Zhang J.-X.
    Zhang Q.
    Zidonghua Xuebao/Acta Automatica Sinica, 2024, 50 (02): : 417 - 430
  • [49] Multi-modal Language Models for Lecture Video Retrieval
    Chen, Huizhong
    Cooper, Matthew
    Joshi, Dhiraj
    Girod, Bernd
    PROCEEDINGS OF THE 2014 ACM CONFERENCE ON MULTIMEDIA (MM'14), 2014, : 1081 - 1084
  • [50] Analogy and multi-modal exploration in the teaching of language theory
    Jeffries, L
    STYLE, 2003, 37 (01) : 67 - +