wav2vec2-based Speech Rating System for Children with Speech Sound Disorder

被引:10
|
作者
Getman, Yaroslav [1 ]
Al-Ghezil, Ragheb [1 ]
Vbskoboinik, Ekaterina [1 ]
Grosz, Tamas [1 ]
Kurimo, Mikko [1 ]
Salvi, Giampiero [2 ]
Svendsen, Torbjorn [2 ]
Strombergsson, Sofia [3 ]
机构
[1] Aalto Univ, Dept Signal Proc & Acoust, Espoo, Finland
[2] Norwegian Univ Sci & Technol, Signal Proc, Trondheim, Norway
[3] Karolinska Inst, Dept Clin Sci Intervent & Technol, Stockholm, Sweden
来源
关键词
speech assessment; goodness of pronunciation; children speech; ASR; wav2vec2;
D O I
10.21437/Interspeech.2022-10103
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Speaking is a fundamental way of communication, developed at a young age. Unfortunately, some children with speech sound disorder struggle to acquire this skill, hindering their ability to communicate efficiently. Speech therapies, which could aid these children in speech acquisition, greatly rely on speech practice trials and accurate feedback about their pronunciations. To enable home therapy and lessen the burden on speech-language pathologists, we need a highly accurate and automatic way of assessing the quality of speech uttered by young children. Our work focuses on exploring the applicability of state-of-the-art self-supervised, deep acoustic models, mainly wav2vec2, for this task. The empirical results highlight that these self-supervised models are superior to traditional approaches and close the gap between machine and human performance.
引用
收藏
页码:3618 / 3622
页数:5
相关论文
共 50 条
  • [21] Emotion Recognition from Speech Using Wav2vec 2.0 Embeddings
    Pepino, Leonardo
    Riera, Pablo
    Ferrer, Luciana
    INTERSPEECH 2021, 2021, : 3400 - 3404
  • [22] Improving Mispronunciation Detection with Wav2vec2-based Momentum Pseudo-Labeling for Accentedness and Intelligibility Assessment
    Yang, Mu
    Hirschi, Kevin
    Looney, Stephen D.
    Kang, Okim
    Hansen, John H. L.
    INTERSPEECH 2022, 2022, : 4481 - 4485
  • [23] Unveiling embedded features in Wav2vec2 and HuBERT msodels for Speech Emotion Recognition
    Chakhtouna, Adil
    Sekkate, Sara
    Adib, Abdellah
    5TH INTERNATIONAL CONFERENCE ON INDUSTRY 4.0 AND SMART MANUFACTURING, ISM 2023, 2024, 232 : 2560 - 2569
  • [24] Improving Speech Translation Accuracy and Time Efficiency With Fine-Tuned wav2vec 2.0-Based Speech Segmentation
    Fukuda, Ryo
    Sudoh, Katsuhito
    Nakamura, Satoshi
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 906 - 916
  • [25] K-Wav2vec 2.0: Automatic Speech Recognition based on Joint Decoding of Graphemes and Syllables
    Kim, Jounghee
    Kang, Pilsung
    INTERSPEECH 2022, 2022, : 4945 - 4949
  • [26] Speech recognition model design for Sundanese language using WAV2VEC 2.0
    Cryssiover A.
    Zahra A.
    International Journal of Speech Technology, 2024, 27 (01) : 171 - 177
  • [27] wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations
    Baevski, Alexei
    Zhou, Henry
    Mohamed, Abdelrahman
    Auli, Michael
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
  • [28] WAV2VEC-SWITCH: CONTRASTIVE LEARNING FROM ORIGINAL-NOISY SPEECH PAIRS FOR ROBUST SPEECH RECOGNITION
    Wang, Yiming
    Li, Jinyu
    Wang, Heming
    Qian, Yao
    Wang, Chengyi
    Wu, Yu
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 7097 - 7101
  • [29] wav2vec-S: Adapting Pre-trained Speech Models for Streaming
    Fu, Biao
    Fan, Kai
    Liao, Minpeng
    Chen, Yidong
    Shi, Xiaodong
    Huang, Zhongqiang
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: ACL 2024, 2024, : 11465 - 11480
  • [30] Wav2vec-C: A Self-supervised Model for Speech Representation Learning
    Sadhu, Samik
    He, Di
    Huang, Che-Wei
    Mallidi, Sri Harish
    Wu, Minhua
    Rastrow, Ariya
    Stolcke, Andreas
    Droppo, Jasha
    Maas, Roland
    INTERSPEECH 2021, 2021, : 711 - 715