Exploring discrete speech units for privacy-preserving and efficient speech recognition for school-aged and preschool children

被引:0
|
作者
Dutta, Satwik [1 ]
Irvin, Dwight [2 ]
Hansen, John H. L. [1 ]
机构
[1] Univ Texas Dallas, Ctr Robust Speech Syst CRSS, Dallas, TX 75080 USA
[2] Univ Florida, Anita Zucker Ctr Excellence Early Childhood Studie, Gainesville, FL USA
基金
美国国家科学基金会;
关键词
Automatic speech recognition; Discrete speech representation; Child speech processing; Speaker privacy; Early childhood; Educational technology; Preschool children; Developmental delay; SPEAKER;
D O I
10.1016/j.ijhcs.2025.103460
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Organizations across the world, including NATO, OECD, the WHO, and the United Nations, as well as many governments, are now employing guidelines for safe, secure, and trustworthy Artificial Intelligence (AI). While technology policies are still being formulated, many AI applications catered toward children have already been developed or are still developing. While designing any child-centered AI, it is utmost importance to keep the children's privacy at the forefront. One modality for child-centered AI is speech/language communication, which has found applications in various educational technologies, tutoring services, as well as interactive learning and social robots. Although, short of a full de-identification of speech segments, longer duration sentences and audio content could reveal partial neutral identifying information (e.g., gender of a child, etc.), but if taken in longer duration context with sequenced longitudinal data (e.g., audio recordings over full days at home or in classrooms, and linked over time), privacy concerns will grow and be critical. Motivated by a privacy-preserving design, this study explores the use of discrete speech units as a form of anonymous encoding, to develop Automatic Speech Recognition (ASR) systems for children that better ensure privacy protection. The primary goal here is to ascertain that discrete speech units retain the key linguistic information for the ASR task of output text creation, but simultaneously lack identifying speaker-specific information, or the ability to potentially re-generate the original speech waveform given the available sequence of discrete speech units. Here, a Discrete ASR model trained on the My Science Tutor Children's Conversational Speech Corpus (MyST) archives an output word-error-rate (WER) of 15.7%. Our Discrete ASR model achieves similar performance in terms of WER when compared to state-of-the-art End-to-End (E2E) ASR models trained using features extracted from large-scale self-supervised pre-trained speech processing model (such as WavLM), although it is noted that E2E ASR models are almost 10 times larger in model checkpoint memory size and number of model parameters and takes 3x the amount of time to train. In addition, open-domain testing on other popular child speech corpora confirms that the proposed Discrete ASR models perform equal to E2E ASR models for corpora containing children speech in the same age range as MyST (e.g., CMU corpus) and slightly lower performance for a corpus containing a wider age range of children (e.g., OGI corpus). Finally, this study also shows that child ASR using the proposed discrete speech units achieves promising performance in recognizing WH-Words, Nouns, Verbs, and Pronouns in an early childhood case study of teacher-child interactions in a childcare facility, involving preschool children with and without speech/language delays which is an extremely vulnerable and challenging speech/language assessment population.
引用
收藏
页数:16
相关论文
共 50 条
  • [21] Executive function in school-aged children with cerebral palsy: Relationship with speech and language
    Sakash, Ashley
    Broman, Aimee Teo
    Rathouz, Paul J.
    Hustad, Katherine C.
    RESEARCH IN DEVELOPMENTAL DISABILITIES, 2018, 78 : 136 - 144
  • [22] SCHOOL-AGED CHILDREN WITH PHONOLOGICAL DISORDERS - COEXISTENCE WITH OTHER SPEECH LANGUAGE DISORDERS
    RUSCELLO, DM
    STLOUIS, KO
    MASON, N
    JOURNAL OF SPEECH AND HEARING RESEARCH, 1991, 34 (02): : 236 - 242
  • [23] Associations Between Speech Perception, Vocabulary, and Phonological Awareness Skill in School-Aged Children With Speech Sound Disorders
    Benway, Nina R.
    Garcia, Kelly
    Hitchcock, Elaine
    McAllister, Tara
    Leece, Megan C.
    Wang, Qiu
    Preston, Jonathan L.
    JOURNAL OF SPEECH LANGUAGE AND HEARING RESEARCH, 2021, 64 (02): : 452 - 463
  • [24] Measuring open-set, word recognition in school-aged children: Corpus of monosyllabic target words and speech maskers
    Bonino, Angela Yarnell
    Malley, Ashley R.
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2019, 146 (04): : EL393 - EL398
  • [25] Concurrent Predictors of Supplementary Sign Use in School-Aged Children With Childhood Apraxia of Speech
    Chenausky, Karen, V
    Verdes, Alison
    Shield, Aaron
    LANGUAGE SPEECH AND HEARING SERVICES IN SCHOOLS, 2022, 53 (04) : 1149 - 1160
  • [26] Story retelling and cognitive ability in school-aged children with cerebral palsy and speech impairment
    Nordberg, A.
    JOURNAL OF INTELLECTUAL DISABILITY RESEARCH, 2016, 60 (7-8) : 715 - 715
  • [27] The speech abilities and quality of life of Malaysian school-aged children with cleft lip and palate
    Lim, Hui Hui
    Bressmann, Tim
    Pang, Alyssa Jun
    Hamid, Badrulzaman Abdul
    Ibrahim, Hasherah Mohd
    INTERNATIONAL JOURNAL OF PEDIATRIC OTORHINOLARYNGOLOGY, 2025, 191
  • [28] Story retelling and language ability in school-aged children with cerebral palsy and speech impairment
    Nordberg, Ann
    Sandberg, Annika Dahlgren
    Miniscalco, Carmela
    INTERNATIONAL JOURNAL OF LANGUAGE & COMMUNICATION DISORDERS, 2015, 50 (06) : 801 - 813
  • [29] Development of visuo-haptic transfer for object recognition in typical preschool and school-aged children
    Purpura, Giulia
    Cioni, Giovanni
    Tinelli, Francesca
    CHILD NEUROPSYCHOLOGY, 2018, 24 (05) : 657 - 670
  • [30] Separating mismatch negativity (MMN) from obligatory brain responses for speech and non-speech sounds in school-aged children
    Lohvansuu, K.
    Bartling, J.
    Bruder, J.
    Honbolygo, F.
    Hamalainen, J. A.
    Iannuzzi, S.
    Nenert, R.
    Neuhoff, N.
    Streiftau, S.
    Tanskanen, A.
    Toth, D.
    Demonet, J. -F.
    Schulte-Koerne, G.
    Csepe, V.
    Leppanen, P. H. T.
    INTERNATIONAL JOURNAL OF PSYCHOPHYSIOLOGY, 2010, 77 (03) : 228 - 228