Unsupervised extraction of phonetic units in sign language videos for natural language processing

被引:0
|
作者
Martinez-Guevara, Niels [1 ]
Rojano-Caceres, Jose-Rafael [1 ]
Curiel, Arturo [2 ]
机构
[1] Univ Veraruzana, Fac Estadist & Informat, Xalapa, Veracruz, Mexico
[2] Univ Veracruzana CONACyT, Xalapa, Veracruz, Mexico
关键词
Sign language; Machine learning; Natural language processing; Image thresholding; FRAMEWORK;
D O I
10.1007/s10209-022-00888-6
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Sign languages (SL) are the natural languages used by Deaf communities to communicate with each other. Signers use visible parts of their bodies, like their hands, to convey messages without sound. Because of this modality change, SLs have to be represented differently in natural language processing (NLP) tasks: Inputs are regularly presented as video data rather than text or sound, which makes even simple tasks computationally intensive. Moreover, the applicability of NLP techniques to SL processing is limited by their linguistic characteristics. For instance, current research in SL recognition has centered around lexical sign identification. However, SLs tend to exhibit lower vocabulary sizes than vocal languages, as signers codify part of their message through highly iconic signs that are not lexicalized. Thus, a lot of potentially relevant information is lost to most NLP algorithms. Furthermore, most documented SL corpora contain less than a hundred video hours; far from enough to train most non-symbolic NLP approaches. This article proposes a method to achieve unsupervised identification of phonetic units in SL videos, based on Image Thresholding using The Liddell and Johnson Movement-Hold Model [13]. The procedure strives to identify the smallest possible linguistic units that may carry relevant information. This is an effort to avoid losing sub-lexical data that would be otherwise missed to most NLP algorithms. Furthermore, the process enables the elimination of noisy or redundant video frames from the input, decreasing the overall computation costs. The algorithm was tested in a collection of Mexican Sign Language videos. The relevance of the extracted segments was assessed by way of human judges. Further comparisons were carried against French Sign Language resources (LSF), so as to explore how well the algorithm performs across different SLs. The results show that the frames selected by the algorithm contained enough information to remain comprehensible to human signers. In some cases, as much as 80% of the available frames could be discarded without loss of comprehensibility, which may have direct repercussions on how SLs are represented, transmitted and processed electronically in the future.
引用
收藏
页码:1143 / 1151
页数:9
相关论文
共 50 条
  • [1] Unsupervised extraction of phonetic units in sign language videos for natural language processing
    Niels Martínez-Guevara
    José-Rafael Rojano-Cáceres
    Arturo Curiel
    [J]. Universal Access in the Information Society, 2023, 22 : 1143 - 1151
  • [2] Detection of phonetic units of the Mexican Sign Language
    Martinez-Guevara, Niels
    Rojano-Caceres, Jose-Rafael
    Curiel, Arturo
    [J]. 2019 INTERNATIONAL CONFERENCE ON INCLUSIVE TECHNOLOGIES AND EDUCATION (CONTIE 2019), 2019, : 168 - 173
  • [3] Unsupervised Discovery of Fingerspelled Letters in Sign Language Videos
    Duman, Feyza
    Ipek, Tanya Deniz
    Saraclar, Murat
    [J]. 29TH IEEE CONFERENCE ON SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS (SIU 2021), 2021,
  • [4] RECOGNITION WITH RAW CANONICAL PHONETIC MOVEMENT AND HANDSHAPE SUBUNITS ON VIDEOS OF CONTINUOUS SIGN LANGUAGE
    Theodorakis, Stavros
    Pitsikalis, Vassilis
    Rodomagoulakis, Isidoros
    Maragos, Petros
    [J]. 2012 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP 2012), 2012, : 1413 - 1416
  • [5] Sign Language Articulators on Phonetic Bearings
    Liddell, Scott K.
    Johnson, Robert E.
    [J]. SIGN LANGUAGE STUDIES, 2019, 20 (01) : 132 - 172
  • [6] Translating Speech to Indian Sign Language Using Natural Language Processing
    Sharma, Purushottam
    Tulsian, Devesh
    Verma, Chaman
    Sharma, Pratibha
    Nancy, Nancy
    [J]. FUTURE INTERNET, 2022, 14 (09):
  • [7] Sign lowering and phonetic reduction in American Sign Language
    Tyrone, Martha E.
    Mauk, Claude E.
    [J]. JOURNAL OF PHONETICS, 2010, 38 (02) : 317 - 328
  • [8] Unsupervised multi-sense language models for natural language processing tasks
    Roh, Jihyeon
    Park, Sungjin
    Kim, Bo-Kyeong
    Oh, Sang-Hoon
    Lee, Soo-Young
    [J]. NEURAL NETWORKS, 2021, 142 : 397 - 409
  • [9] Alignment Based Extraction of Isolated Signs from Sign Language Videos
    Santemiz, Pinar
    Aran, Oya
    Saraclar, Murat
    Akarun, Lale
    [J]. 2009 IEEE 17TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE, VOLS 1 AND 2, 2009, : 758 - +
  • [10] Cartoonized Anonymization of Sign Language Videos
    Tze, Christina O.
    Filntisis, Panagiotis P.
    Roussos, Anastasios
    Maragos, Petros
    [J]. 2022 IEEE 14TH IMAGE, VIDEO, AND MULTIDIMENSIONAL SIGNAL PROCESSING WORKSHOP (IVMSP), 2022,