A Method Improves Speech Recognition with Contrastive Learning in Low-Resource Languages

被引:4
|
作者
Sun, Lixu [1 ,2 ]
Yolwas, Nurmemet [1 ,2 ]
Jiang, Lina [1 ,2 ]
机构
[1] Xinjiang Multilingual Informat Technol Lab, Urumqi 830017, Peoples R China
[2] Xinjiang Univ, Coll Informat Sci & Engn, Urumqi 830017, Peoples R China
来源
APPLIED SCIENCES-BASEL | 2023年 / 13卷 / 08期
基金
中国国家自然科学基金;
关键词
contrastive learning; negative samples; wav2vec; 2; 0; low-resource;
D O I
10.3390/app13084836
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Building an effective automatic speech recognition system typically requires a large amount of high-quality labeled data; However, this can be challenging for low-resource languages. Currently, self-supervised contrastive learning has shown promising results in low-resource automatic speech recognition, but there is no discussion on the quality of negative sample sets in speech contrastive learning. In this paper, we propose the false negatives impact elimination (FNIE) method to filter false negative samples and improve the quality of the negative sample set in speech. FNIE compares the support vector with the negative sample vector set and optimizes the corresponding loss function, allowing the model to learn better speech representations and achieve superior results in low-resource speech recognition. Experiments demonstrate that FNIE effectively filters negative samples, enhances the quality of the negative sample set, and improves the accuracy of speech recognition. The quality of the negative sample set significantly affects the model's learning ability, and using too many negative samples can deteriorate it. In a low-resource setting, our FNIE method achieved a relative improvement of 2.98% in WER on the English dataset, 14.3% in WER on the Uyghur dataset, and 4.04% in CER on the Mandarin dataset compared to the baseline model.
引用
收藏
页数:14
相关论文
共 50 条
  • [1] Meta adversarial learning improves low-resource speech recognition
    Chen, Yaqi
    Yang, Xukui
    Zhang, Hao
    Zhang, Wenlin
    Qu, Dan
    Chen, Cong
    [J]. COMPUTER SPEECH AND LANGUAGE, 2024, 84
  • [2] ON SCALING CONTRASTIVE REPRESENTATIONS FOR LOW-RESOURCE SPEECH RECOGNITION
    Borgholt, Lasse
    Tax, Tycho M. S.
    Havtorn, Jakob D.
    Maaloe, Lars
    Igel, Christian
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 3885 - 3889
  • [3] Speech recognition datasets for low-resource Congolese languages
    Kimanuka, Ussen
    Maina, Ciira wa
    Buyuk, Osman
    [J]. DATA IN BRIEF, 2024, 52
  • [4] SUPERVISED AND UNSUPERVISED ACTIVE LEARNING FOR AUTOMATIC SPEECH RECOGNITION OF LOW-RESOURCE LANGUAGES
    Syed, Ali Raza
    Rosenberg, Andrew
    Kislal, Ellen
    [J]. 2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 5320 - 5324
  • [5] Low-resource Sinhala Speech Recognition using Deep Learning
    Karunathilaka, Hirunika
    Welgama, Viraj
    Nadungodage, Thilini
    Weerasinghe, Ruvan
    [J]. 2020 20TH INTERNATIONAL CONFERENCE ON ADVANCES IN ICT FOR EMERGING REGIONS (ICTER-2020), 2020, : 196 - 201
  • [6] META-LEARNING FOR LOW-RESOURCE SPEECH EMOTION RECOGNITION
    Chopra, Suransh
    Mathur, Puneet
    Sawhney, Ramit
    Shah, Rajiv Ratn
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6259 - 6263
  • [7] END-TO-END SPEECH RECOGNITION AND KEYWORD SEARCH ON LOW-RESOURCE LANGUAGES
    Rosenberg, Andrew
    Audhkhasi, Kartik
    Sethy, Abhinav
    Ramabhadran, Bhuvana
    Picheny, Michael
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 5280 - 5284
  • [8] AUTOMATIC RATING OF SPONTANEOUS SPEECH FOR LOW-RESOURCE LANGUAGES
    Al-Ghezi, Ragheb
    Getman, Yaroslav
    Voskoboinik, Ekaterina
    Singh, Mittul
    Kurimo, Mikko
    [J]. 2022 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, SLT, 2022, : 339 - 345
  • [9] Multitask Learning of Deep Neural Networks for Low-Resource Speech Recognition
    Chen, Dongpeng
    Mak, Brian Kan-Wing
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2015, 23 (07) : 1172 - 1183
  • [10] Language-Adversarial Transfer Learning for Low-Resource Speech Recognition
    Yi, Jiangyan
    Tao, Jianhua
    Wen, Zhengqi
    Bai, Ye
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2019, 27 (03) : 621 - 630