wav2vec2-based Speech Rating System for Children with Speech Sound Disorder

被引:10
|
作者
Getman, Yaroslav [1 ]
Al-Ghezil, Ragheb [1 ]
Vbskoboinik, Ekaterina [1 ]
Grosz, Tamas [1 ]
Kurimo, Mikko [1 ]
Salvi, Giampiero [2 ]
Svendsen, Torbjorn [2 ]
Strombergsson, Sofia [3 ]
机构
[1] Aalto Univ, Dept Signal Proc & Acoust, Espoo, Finland
[2] Norwegian Univ Sci & Technol, Signal Proc, Trondheim, Norway
[3] Karolinska Inst, Dept Clin Sci Intervent & Technol, Stockholm, Sweden
来源
关键词
speech assessment; goodness of pronunciation; children speech; ASR; wav2vec2;
D O I
10.21437/Interspeech.2022-10103
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Speaking is a fundamental way of communication, developed at a young age. Unfortunately, some children with speech sound disorder struggle to acquire this skill, hindering their ability to communicate efficiently. Speech therapies, which could aid these children in speech acquisition, greatly rely on speech practice trials and accurate feedback about their pronunciations. To enable home therapy and lessen the burden on speech-language pathologists, we need a highly accurate and automatic way of assessing the quality of speech uttered by young children. Our work focuses on exploring the applicability of state-of-the-art self-supervised, deep acoustic models, mainly wav2vec2, for this task. The empirical results highlight that these self-supervised models are superior to traditional approaches and close the gap between machine and human performance.
引用
收藏
页码:3618 / 3622
页数:5
相关论文
共 50 条
  • [31] Classification of Vocal Intensity Category from Speech using the Wav2vec2 and Whisper Embeddings
    Kodali, Manila
    Kadiri, Sudarsana Reddy
    Alku, Paavo
    INTERSPEECH 2023, 2023, : 4134 - 4138
  • [32] The speech perception skills of children with and without speech sound disorder
    Hearnshaw, Stephanie
    Baker, Elise
    Munro, Natalie
    JOURNAL OF COMMUNICATION DISORDERS, 2018, 71 : 61 - 71
  • [33] Multichannel AV-wav2vec2: A Framework for Learning Multichannel Multi -Modal Speech Representation
    Zhu, Qiushi
    Zhang, Jie
    Gu, Yu
    Hu, Yuchen
    Dai, Lirong
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 17, 2024, : 19768 - 19776
  • [34] What Can an Accent Identifier Learn? Probing Phonetic and Prosodic Information in a Wav2vec2-based Accent Identification Model
    Yang, Mu
    Shekar, Ram C. M. C.
    Kang, Okim
    Hansen, John H. L.
    INTERSPEECH 2023, 2023, : 1923 - 1927
  • [35] Combining wav2vec 2.0 Fine-Tuning and ConLearnNet for Speech Emotion Recognition
    Sun, Chenjing
    Zhou, Yi
    Huang, Xin
    Yang, Jichen
    Hou, Xianhua
    ELECTRONICS, 2024, 13 (06)
  • [36] Transfer Ability of Monolingual Wav2vec2.0 for Low-resource Speech Recognition
    Yi, Cheng
    Wang, Jianzong
    Cheng, Ning
    Zhou, Shiyu
    Xu, Bo
    2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,
  • [37] W2V2-Light: A Lightweight Version of Wav2vec 2.0 for Automatic Speech Recognition
    Kim, Dong-Hyun
    Lee, Jae-Hong
    Mo, Ji-Hwan
    Chang, Joon-Hyuk
    INTERSPEECH 2022, 2022, : 3038 - 3042
  • [38] A CLOSER LOOK AT WAV2VEC2 EMBEDDINGS FOR ON-DEVICE SINGLE-CHANNEL SPEECH ENHANCEMENT
    Shankar, Ravi
    Tan, Ke
    Xu, Buye
    Kumar, Anurag
    2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, ICASSP 2024, 2024, : 751 - 755
  • [39] Implications of diadochokinesia in children with speech sound disorder
    Wertzner, Haydee Fiszbein
    Pagan-Neves, Luciana de Oliveira
    Alves, Renata Ramos
    Barrozo, Tatiane Faria
    CODAS, 2013, 25 (01): : 52 - 58
  • [40] Speech perception and children with speech sound disorder: An assessment of non-errored speech sounds
    Hitchcock, Elaine R.
    Koenig, Laura
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2023, 153 (03):