Weighted Multi-modal Sign Language Recognition

被引:0
|
作者
Liu, Edmond [1 ]
Lim, Jong Yoon [1 ]
MacDonald, Bruce [1 ]
Ahn, Ho Seok [1 ]
机构
[1] Univ Auckland, Dept Elect Comp & Software Engn, Fac Engn, Auckland, New Zealand
关键词
D O I
10.1109/RO-MAN60168.2024.10731214
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Multiple modalities can boost accuracy in the difficult task of Sign Language Recognition (SLR), however, each modality does not necessarily contribute the same quality of information. Current multi-modal approaches assign the same importance weightings to each modality, or set weightings based on unproven heuristics. This paper takes a systematic approach to find the optimal weights by performing grid search. Firstly, we create a multi-modal version of the RGB only WLASL100 data with additional hand crop and skeletal pose modalities. Secondly, we create a 3D CNN based weighted multi-modal sign language network (WMSLRnet). Finally, we run various grid searches to find the optimal weightings for each modality. We show that very minor adjustments in the weightings can have major effects on the final SLR accuracy. On WLASL100, we significantly outperform previous networks of similar design, and achieve high accuracy in SLR without highly complex pre-training schemes or extra data.
引用
收藏
页码:880 / 885
页数:6
相关论文
共 50 条
  • [21] Multi-modal Sensing for Behaviour Recognition
    Wang, Ziwei
    Liu, Jiajun
    Arablouei, Reza
    Bishop-Hurley, Greg
    Matthews, Melissa
    Borges, Paulo
    PROCEEDINGS OF THE 2022 THE 28TH ANNUAL INTERNATIONAL CONFERENCE ON MOBILE COMPUTING AND NETWORKING, ACM MOBICOM 2022, 2022, : 900 - 902
  • [22] Traffic Sign Recognition Based on Parameter-Free Detector and Multi-modal Representation
    Gu Mingqin
    Chen Xiaohua
    Zhang Shaoyong
    Ren Xiaoping
    ALGORITHMS AND ARCHITECTURES FOR PARALLEL PROCESSING, ICA3PP 2016 COLLOCATED WORKSHOPS, 2016, 10049 : 115 - 124
  • [23] Multi-Modal Multi-Action Video Recognition
    Shi, Zhensheng
    Liang, Ju
    Li, Qianqian
    Zheng, Haiyong
    Gu, Zhaorui
    Dong, Junyu
    Zheng, Bing
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 13658 - 13667
  • [24] Multi-modal sign icon retrieval for augmentative communication
    Wu, CH
    Chiu, YH
    Cheng, KW
    ADVANCES IN MUTLIMEDIA INFORMATION PROCESSING - PCM 2001, PROCEEDINGS, 2001, 2195 : 598 - 605
  • [25] On Multi-modal Fusion for Freehand Gesture Recognition
    Schak, Monika
    Gepperth, Alexander
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING, ICANN 2020, PT I, 2020, 12396 : 862 - 873
  • [26] Multi-modal Attention for Speech Emotion Recognition
    Pan, Zexu
    Luo, Zhaojie
    Yang, Jichen
    Li, Haizhou
    INTERSPEECH 2020, 2020, : 364 - 368
  • [27] Traffic Sign Recognition via Multi-Modal Tree-Structure Embedded Multi-Task Learning
    Lu, Xiao
    Wang, Yaonan
    Zhou, Xuanyu
    Zhang, Zhenjun
    Ling, Zhigang
    IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2017, 18 (04) : 960 - 972
  • [28] Directing Humanoids in a Multi-modal Command Language
    Oka, Tetsushi
    Abe, Toyokazu
    Shimoji, Masato
    Nakamura, Takuya
    Sugita, Kaoru
    Yokota, Masao
    2008 17TH IEEE INTERNATIONAL SYMPOSIUM ON ROBOT AND HUMAN INTERACTIVE COMMUNICATION, VOLS 1 AND 2, 2008, : 580 - 585
  • [29] Towards Efficient Multi-Modal Emotion Recognition
    Dobrisek, Simon
    Gajsek, Rok
    Mihelic, France
    Pavesic, Nikola
    Struc, Vitomir
    INTERNATIONAL JOURNAL OF ADVANCED ROBOTIC SYSTEMS, 2013, 10
  • [30] Modality Mixer for Multi-modal Action Recognition
    Lee, Sumin
    Woo, Sangmin
    Park, Yeonju
    Nugroho, Muhammad Adi
    Kim, Changick
    2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2023, : 3297 - 3306