A study on fine-tuning wav2vec2.0 Model for the task of Mispronunciation Detection and Diagnosis

被引:19
|
作者
Peng, Linkai [1 ]
Fu, Kaiqi [1 ]
Lin, Binghuai [2 ]
Ke, Dengfeng [1 ]
Zhan, Jinsong [1 ]
机构
[1] Beijing Language & Culture Univ, Beijing, Peoples R China
[2] Tencent Technol Co Ltd, Smart Platform Prod Dept, Shenzhen, Peoples R China
来源
关键词
self-supervised; mispronunciation detection and diagnosis (MDD); computer-aided pronunciation training (CAPT); wav2vec; 2.0; pre-training;
D O I
10.21437/Interspeech.2021-1344
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
Mispronunciation detection and diagnosis (MDD) technology is a key component of computer-assisted pronunciation training system (CAPT). The mainstream method is based on deep neural network automatic speech recognition. Unfortunately, the technique requires massive human-annotated speech recordings for training. Due to the huge variations in mother tongue, age, and proficiency level among second language learners, it is difficult to gather a large amount of matching data for acoustic model training, which greatly limits the model performance. In this paper, we explore the use of Self-Supervised Pretraining (SSP) model wav2vec2.0 for MDD tasks. SSP utilizes a large unlabelled dataset to learn general representation and can be applied in downstream tasks. We conduct experiments using two publicly available datasets (TIMIT, L2-arctic) and our best system achieves 60.44% f1-score. Moreover, our method is able to achieve 55.52% f1-score with 3 times less data, which demonstrates the effectiveness of SSP on MDD1.
引用
收藏
页码:4448 / 4452
页数:5
相关论文
共 50 条
  • [1] Explore Wav2vec 2.0 for Mispronunciation Detection
    Xu, Xiaoshuo
    Kang, Yueteng
    Cao, Songjun
    Lin, Binghuai
    Ma, Long
    INTERSPEECH 2021, 2021, : 4428 - 4432
  • [2] Enhancing Stuttering Detection and Classification using Wav2Vec2.0
    Sen, Madhurima
    Das, Pradip K.
    2024 4TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND SIGNAL PROCESSING, AISP, 2024,
  • [3] Combining wav2vec 2.0 Fine-Tuning and ConLearnNet for Speech Emotion Recognition
    Sun, Chenjing
    Zhou, Yi
    Huang, Xin
    Yang, Jichen
    Hou, Xianhua
    ELECTRONICS, 2024, 13 (06)
  • [4] FINE-TUNING WAV2VEC2 FOR SPEAKER RECOGNITION
    Vaessen, Nik
    Van Leeuwen, David A.
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 7967 - 7971
  • [5] Exploring the influence of fine-tuning data on wav2vec 2.0 model for blind speech quality prediction
    Becerra, Helard
    Ragano, Alessandro
    Hines, Andrew
    INTERSPEECH 2022, 2022, : 4088 - 4092
  • [6] Keyword spotting for dialectal speech and Introduction of wav2vec2.0
    Ariga, Tomohiro
    Minakawa, Reo
    Kojima, Kazunori
    Lee, Shi-Wook
    Itoh, Yoshiaki
    APSIPA ASC 2024 - Asia Pacific Signal and Information Processing Association Annual Summit and Conference 2024, 2024,
  • [7] Exploring the Impact of Fine-Tuning the Wav2vec2 Model in Database-Independent Detection of Dysarthric Speech
    Javanmardi, Farhad
    Kadiri, Sudarsana Reddy
    Alku, Paavo
    IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, 2024, 28 (08) : 4951 - 4962
  • [8] Kazakh Speech Recognition: Wav2vec2.0 vs. Whisper
    Kozhirbayev, Zhanibek
    JOURNAL OF ADVANCES IN INFORMATION TECHNOLOGY, 2023, 14 (06) : 1382 - 1389
  • [9] Investigating wav2vec2 context representations and the effects of fine-tuning, a case-study of a Finnish model
    Grosz, Tamas
    Getman, Yaroslav
    Al-Ghezi, Ragheb
    Rouhe, Aku
    Kurimo, Mikko
    INTERSPEECH 2023, 2023, : 196 - 200
  • [10] Automatic Speech Disfluency Detection Using wav2vec2.0 for Different Languages with Variable Lengths
    Liu, Jiajun
    Wumaier, Aishan
    Wei, Dongping
    Guo, Shen
    APPLIED SCIENCES-BASEL, 2023, 13 (13):