Unsupervised extraction of phonetic units in sign language videos for natural language processing

被引:0
|
作者
Martinez-Guevara, Niels [1 ]
Rojano-Caceres, Jose-Rafael [1 ]
Curiel, Arturo [2 ]
机构
[1] Univ Veraruzana, Fac Estadist & Informat, Xalapa, Veracruz, Mexico
[2] Univ Veracruzana CONACyT, Xalapa, Veracruz, Mexico
关键词
Sign language; Machine learning; Natural language processing; Image thresholding; FRAMEWORK;
D O I
10.1007/s10209-022-00888-6
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Sign languages (SL) are the natural languages used by Deaf communities to communicate with each other. Signers use visible parts of their bodies, like their hands, to convey messages without sound. Because of this modality change, SLs have to be represented differently in natural language processing (NLP) tasks: Inputs are regularly presented as video data rather than text or sound, which makes even simple tasks computationally intensive. Moreover, the applicability of NLP techniques to SL processing is limited by their linguistic characteristics. For instance, current research in SL recognition has centered around lexical sign identification. However, SLs tend to exhibit lower vocabulary sizes than vocal languages, as signers codify part of their message through highly iconic signs that are not lexicalized. Thus, a lot of potentially relevant information is lost to most NLP algorithms. Furthermore, most documented SL corpora contain less than a hundred video hours; far from enough to train most non-symbolic NLP approaches. This article proposes a method to achieve unsupervised identification of phonetic units in SL videos, based on Image Thresholding using The Liddell and Johnson Movement-Hold Model [13]. The procedure strives to identify the smallest possible linguistic units that may carry relevant information. This is an effort to avoid losing sub-lexical data that would be otherwise missed to most NLP algorithms. Furthermore, the process enables the elimination of noisy or redundant video frames from the input, decreasing the overall computation costs. The algorithm was tested in a collection of Mexican Sign Language videos. The relevance of the extracted segments was assessed by way of human judges. Further comparisons were carried against French Sign Language resources (LSF), so as to explore how well the algorithm performs across different SLs. The results show that the frames selected by the algorithm contained enough information to remain comprehensible to human signers. In some cases, as much as 80% of the available frames could be discarded without loss of comprehensibility, which may have direct repercussions on how SLs are represented, transmitted and processed electronically in the future.
引用
收藏
页码:1143 / 1151
页数:9
相关论文
共 50 条
  • [31] 'Units of meaning' in medical documents Natural language processing perspective
    Popolov, Dimitri
    Barr, Joseph R.
    [J]. 2014 IEEE INTERNATIONAL CONFERENCE ON SEMANTIC COMPUTING (ICSC), 2014, : 320 - 323
  • [32] Units of Meaning in Medical Documents: A Natural Language Processing Perspective
    Barr, Joseph R.
    Popolov, Dimitri
    [J]. INTERNATIONAL JOURNAL OF SEMANTIC COMPUTING, 2014, 8 (03) : 249 - 255
  • [33] Unsupervised learning and natural language processing highlight research trends in a superbug
    Mendez-Cruz, Carlos-Francisco
    Rodriguez-Herrera, Joel
    Varela-Vega, Alfredo
    Mateo-Estrada, Valeria
    Castillo-Ramirez, Santiago
    [J]. FRONTIERS IN ARTIFICIAL INTELLIGENCE, 2024, 7
  • [34] Natural Language Processing and Language Technologies for the Basque Language
    Gonzalez-Dios, Itziar
    Altuna, Begona
    [J]. CUADERNOS EUROPEOS DE DEUSTO, 2022, : 203 - 230
  • [35] Natural Language-Assisted Sign Language Recognition
    Zuo, Ronglai
    Wei, Fangyun
    Mak, Brian
    [J]. 2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 14890 - 14900
  • [36] A Bilingual Social Robot with Sign Language and Natural Language
    Hei, Xiaoxuan
    Yu, Chuang
    Zhang, Heng
    Tapus, Adriana
    [J]. COMPANION OF THE 2024 ACM/IEEE INTERNATIONAL CONFERENCE ON HUMAN-ROBOT INTERACTION, HRI 2024 COMPANION, 2024, : 526 - 529
  • [37] UNSUPERVISED CLASSIFICATION OF EXTREME FACIAL EVENTS USING ACTIVE APPEARANCE MODELS TRACKING FOR SIGN LANGUAGE VIDEOS
    Antonakos, Epameinondas
    Pitsikalis, Vassilis
    Rodomagoulakis, Isidoros
    Maragos, Petros
    [J]. 2012 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP 2012), 2012, : 1409 - 1412
  • [38] Comparing Methods of Displaying Language Feedback for Student Videos of American Sign Language
    Huenerfauth, Matt
    Gale, Elaine
    Penly, Brian
    Willard, Mackenzie
    Hariharan, Dhananjai
    [J]. ASSETS'15: PROCEEDINGS OF THE 17TH INTERNATIONAL ACM SIGACCESS CONFERENCE ON COMPUTERS & ACCESSIBILITY, 2015, : 139 - 146
  • [39] THE SIGN LANGUAGE INTERCHANGE FORMAT: HARMONISING SIGN LANGUAGE DATASETS FOR COMPUTATIONAL PROCESSING
    Schulder, Marc
    Bigeard, Sam
    Hanke, Thomas
    Kopf, Maria
    [J]. 2023 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING WORKSHOPS, ICASSPW, 2023,
  • [40] Natural language processing
    Martinez, Angel R.
    [J]. WILEY INTERDISCIPLINARY REVIEWS-COMPUTATIONAL STATISTICS, 2010, 2 (03) : 352 - 357