wav2vec2-based Speech Rating System for Children with Speech Sound Disorder

被引:10
|
作者
Getman, Yaroslav [1 ]
Al-Ghezil, Ragheb [1 ]
Vbskoboinik, Ekaterina [1 ]
Grosz, Tamas [1 ]
Kurimo, Mikko [1 ]
Salvi, Giampiero [2 ]
Svendsen, Torbjorn [2 ]
Strombergsson, Sofia [3 ]
机构
[1] Aalto Univ, Dept Signal Proc & Acoust, Espoo, Finland
[2] Norwegian Univ Sci & Technol, Signal Proc, Trondheim, Norway
[3] Karolinska Inst, Dept Clin Sci Intervent & Technol, Stockholm, Sweden
来源
关键词
speech assessment; goodness of pronunciation; children speech; ASR; wav2vec2;
D O I
10.21437/Interspeech.2022-10103
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Speaking is a fundamental way of communication, developed at a young age. Unfortunately, some children with speech sound disorder struggle to acquire this skill, hindering their ability to communicate efficiently. Speech therapies, which could aid these children in speech acquisition, greatly rely on speech practice trials and accurate feedback about their pronunciations. To enable home therapy and lessen the burden on speech-language pathologists, we need a highly accurate and automatic way of assessing the quality of speech uttered by young children. Our work focuses on exploring the applicability of state-of-the-art self-supervised, deep acoustic models, mainly wav2vec2, for this task. The empirical results highlight that these self-supervised models are superior to traditional approaches and close the gap between machine and human performance.
引用
收藏
页码:3618 / 3622
页数:5
相关论文
共 50 条
  • [1] Effect of Speech Modification on Wav2Vec2 Models for Children Speech Recognition
    Sinha, Abhijit
    Singh, Mittul
    Kadiri, Sudarsana Reddy
    Kurimo, Mikko
    Kathania, Hemant Kumar
    2024 INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING AND COMMUNICATIONS, SPCOM 2024, 2024,
  • [2] A WAV2VEC2-Based Experimental Study on Self-Supervised Learning Methods to Improve Child Speech Recognition
    Jain, Rishabh
    Barcovschi, Andrei
    Yiwere, Mariam Yahayah
    Bigioi, Dan
    Corcoran, Peter
    Cucu, Horia
    IEEE ACCESS, 2023, 11 : 46938 - 46948
  • [3] Wav2vec2-based Paralinguistic Systems to Recognise Vocalised Emotions and Stuttering
    Grosz, Tamas
    Porjazovski, Dejan
    Getman, Yaroslav
    Kadiri, Sudarsana Reddy
    Kurimo, Mikko
    PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 7026 - 7029
  • [4] Assessment of Non-Native Speech Intelligibility using Wav2vec2-based Mispronunciation Detection and Multi-level Goodness of Pronunciation Transformer
    Shekar, Ram C. M. C.
    Yang, Mu
    Hirschi, Kevin
    Looney, Stephen
    Kang, Okim
    Hansen, John
    INTERSPEECH 2023, 2023, : 984 - 988
  • [5] Improving wav2vec2-based Spoken Language Identification by Learning Phonological Features
    Shahin, Mostafa
    Nan, Zheng
    Sethu, Vidhyasaharan
    Ahmed, Beena
    INTERSPEECH 2023, 2023, : 4119 - 4123
  • [6] Keyword spotting for dialectal speech and Introduction of wav2vec2.0
    Ariga, Tomohiro
    Minakawa, Reo
    Kojima, Kazunori
    Lee, Shi-Wook
    Itoh, Yoshiaki
    APSIPA ASC 2024 - Asia Pacific Signal and Information Processing Association Annual Summit and Conference 2024, 2024,
  • [7] Detection of Prosodic Boundaries in Speech Using Wav2Vec 2.0
    Kunesova, Marie
    Rezackova, Marketa
    TEXT, SPEECH, AND DIALOGUE (TSD 2022), 2022, 13502 : 377 - 388
  • [8] Brazilian Portuguese Speech Recognition Using Wav2vec 2.0
    Stefanel Gris, Lucas Rafael
    Casanova, Edresson
    de Oliveira, Frederico Santos
    Soares, Anderson da Silva
    Candido Junior, Arnaldo
    COMPUTATIONAL PROCESSING OF THE PORTUGUESE LANGUAGE, PROPOR 2022, 2022, 13208 : 333 - 343
  • [9] Siamese Network with Wav2vec Feature for Spoofing Speech Detection
    Xie, Yang
    Zhang, Zhenchuan
    Yang, Yingchun
    INTERSPEECH 2021, 2021, : 4269 - 4273
  • [10] Evaluation of Wav2Vec Speech Recognition for Speakers with Cognitive Disorders
    Svec, Jan
    Polak, Filip
    Bartos, Ales
    Zapletalova, Michaela
    Vita, Martin
    TEXT, SPEECH, AND DIALOGUE (TSD 2022), 2022, 13502 : 501 - 512