A study on fine-tuning wav2vec2.0 Model for the task of Mispronunciation Detection and Diagnosis

被引:19
|
作者
Peng, Linkai [1 ]
Fu, Kaiqi [1 ]
Lin, Binghuai [2 ]
Ke, Dengfeng [1 ]
Zhan, Jinsong [1 ]
机构
[1] Beijing Language & Culture Univ, Beijing, Peoples R China
[2] Tencent Technol Co Ltd, Smart Platform Prod Dept, Shenzhen, Peoples R China
来源
关键词
self-supervised; mispronunciation detection and diagnosis (MDD); computer-aided pronunciation training (CAPT); wav2vec; 2.0; pre-training;
D O I
10.21437/Interspeech.2021-1344
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
Mispronunciation detection and diagnosis (MDD) technology is a key component of computer-assisted pronunciation training system (CAPT). The mainstream method is based on deep neural network automatic speech recognition. Unfortunately, the technique requires massive human-annotated speech recordings for training. Due to the huge variations in mother tongue, age, and proficiency level among second language learners, it is difficult to gather a large amount of matching data for acoustic model training, which greatly limits the model performance. In this paper, we explore the use of Self-Supervised Pretraining (SSP) model wav2vec2.0 for MDD tasks. SSP utilizes a large unlabelled dataset to learn general representation and can be applied in downstream tasks. We conduct experiments using two publicly available datasets (TIMIT, L2-arctic) and our best system achieves 60.44% f1-score. Moreover, our method is able to achieve 55.52% f1-score with 3 times less data, which demonstrates the effectiveness of SSP on MDD1.
引用
收藏
页码:4448 / 4452
页数:5
相关论文
共 50 条
  • [11] Speech emotion recognition using fine-tuned Wav2vec2.0 and neural controlleddifferential equations classifier
    Wang, Ni
    Yang, Danyu
    PLOS ONE, 2025, 20 (02):
  • [12] The Graph feature fusion technique for speaker recognition based on wav2vec2.0 framework
    Ge, Zirui
    Guo, Haiyan
    Wang, Tingting
    Yang, Zhen
    arXiv, 2023,
  • [13] Transfer Ability of Monolingual Wav2vec2.0 for Low-resource Speech Recognition
    Yi, Cheng
    Wang, Jianzong
    Cheng, Ning
    Zhou, Shiyu
    Xu, Bo
    2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,
  • [14] Damage localization method using ultrasonic lamb waves and Wav2Vec2.0 neural network
    Qian, Lubin
    Liu, Sihao
    Fan, Guopeng
    Liu, Xinlong
    Zhang, Hui
    Mei, Yaohua
    Xing, Yuhui
    Wang, Zhiqiang
    FRONTIERS IN MATERIALS, 2023, 10
  • [15] Enhancing Language Identification in Indian Context Through Exploiting Learned Features with Wav2Vec2.0
    Gupta, Shivang
    Motepalli, Kowshik Siva Sai
    Kumar, Ravi
    Narasinga, Vamsi
    Mirishkar, Sai Ganesh
    Vuppala, Anil Kumar
    SPEECH AND COMPUTER, SPECOM 2023, PT II, 2023, 14339 : 503 - 512
  • [16] Detection of Prosodic Boundaries in Speech Using Wav2Vec 2.0
    Kunesova, Marie
    Rezackova, Marketa
    TEXT, SPEECH, AND DIALOGUE (TSD 2022), 2022, 13502 : 377 - 388
  • [17] Novel Speech Recognition Systems Applied to Forensics within Child Exploitation: Wav2vec2.0 vs. Whisper
    Vasquez-Correa, Juan Camilo
    alvarez Muniain, Aitor
    SENSORS, 2023, 23 (04)
  • [18] Computation and Memory Efficient Noise Adaptation of Wav2Vec2.0 for Noisy Speech Emotion Recognition with Skip Connection Adapters
    Leem, Seong-Gyun
    Fulford, Daniel
    Onnela, Jukka-Pekka
    Gard, David
    Busso, Carlos
    INTERSPEECH 2023, 2023, : 1888 - 1892
  • [19] 基于Wav2vec2.0与语境情感信息补偿的对话语音情感识别
    曹荣贺
    吴晓龙
    冯畅
    郑方
    徐明星
    哈妮克孜伊拉洪
    艾斯卡尔艾木都拉
    信号处理, 2023, (04) : 698 - 707
  • [20] SYNTHETIC SPEECH DETECTION WITH WAV2VEC 2.0 IN VARIOUS LANGUAGE SETTINGS
    Dropulic, Branimir
    Suflaj, Miljenko
    Jertec, Andrej
    Obad, Leo
    2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING WORKSHOPS, ICASSPW 2024, 2024, : 585 - 589