A Method Improves Speech Recognition with Contrastive Learning in Low-Resource Languages

被引:4
|
作者
Sun, Lixu [1 ,2 ]
Yolwas, Nurmemet [1 ,2 ]
Jiang, Lina [1 ,2 ]
机构
[1] Xinjiang Multilingual Informat Technol Lab, Urumqi 830017, Peoples R China
[2] Xinjiang Univ, Coll Informat Sci & Engn, Urumqi 830017, Peoples R China
来源
APPLIED SCIENCES-BASEL | 2023年 / 13卷 / 08期
基金
中国国家自然科学基金;
关键词
contrastive learning; negative samples; wav2vec; 2; 0; low-resource;
D O I
10.3390/app13084836
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Building an effective automatic speech recognition system typically requires a large amount of high-quality labeled data; However, this can be challenging for low-resource languages. Currently, self-supervised contrastive learning has shown promising results in low-resource automatic speech recognition, but there is no discussion on the quality of negative sample sets in speech contrastive learning. In this paper, we propose the false negatives impact elimination (FNIE) method to filter false negative samples and improve the quality of the negative sample set in speech. FNIE compares the support vector with the negative sample vector set and optimizes the corresponding loss function, allowing the model to learn better speech representations and achieve superior results in low-resource speech recognition. Experiments demonstrate that FNIE effectively filters negative samples, enhances the quality of the negative sample set, and improves the accuracy of speech recognition. The quality of the negative sample set significantly affects the model's learning ability, and using too many negative samples can deteriorate it. In a low-resource setting, our FNIE method achieved a relative improvement of 2.98% in WER on the English dataset, 14.3% in WER on the Uyghur dataset, and 4.04% in CER on the Mandarin dataset compared to the baseline model.
引用
收藏
页数:14
相关论文
共 50 条
  • [31] USING SPEECH ENHANCEMENT TO REALIZE SPEECH SYNTHESIS OF LOW-RESOURCE DUNGAN LANGUAGES
    Jiang, Rui
    Chen, Chengsi
    Shan, Xin
    Yang, Hongwu
    [J]. 2021 24TH CONFERENCE OF THE ORIENTAL COCOSDA INTERNATIONAL COMMITTEE FOR THE CO-ORDINATION AND STANDARDISATION OF SPEECH DATABASES AND ASSESSMENT TECHNIQUES (O-COCOSDA), 2021, : 193 - 198
  • [32] DEEP MAXOUT NETWORKS FOR LOW-RESOURCE SPEECH RECOGNITION
    Miao, Yajie
    Metze, Florian
    Rawat, Shourabh
    [J]. 2013 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING (ASRU), 2013, : 398 - 403
  • [33] LRSpeech: Extremely Low-Resource Speech Synthesis and Recognition
    Xu, Jin
    Tan, Xu
    Ren, Yi
    Qin, Tao
    Li, Jian
    Zhao, Sheng
    Liu, Tie-Yan
    [J]. KDD '20: PROCEEDINGS OF THE 26TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING, 2020, : 2802 - 2812
  • [34] Optimizing Data Usage for Low-Resource Speech Recognition
    Qian, Yanmin
    Zhou, Zhikai
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2022, 30 : 394 - 403
  • [35] ADVERSARIAL MULTILINGUAL TRAINING FOR LOW-RESOURCE SPEECH RECOGNITION
    Yi, Jiangyan
    Tao, Jianhua
    Wen, Zhengqi
    Bai, Ye
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 4899 - 4903
  • [36] Frontier Research on Low-Resource Speech Recognition Technology
    Slam, Wushour
    Li, Yanan
    Urouvas, Nurmamet
    [J]. SENSORS, 2023, 23 (22)
  • [37] MuCoT: Multilingual Contrastive Training for Question-Answering in Low-resource Languages
    Kumar, Gokul Karthik
    Gehlot, Abhishek Singh
    Mullappilly, Sahal Shaji
    Nandakumar, Karthik
    [J]. PROCEEDINGS OF THE SECOND WORKSHOP ON SPEECH AND LANGUAGE TECHNOLOGIES FOR DRAVIDIAN LANGUAGES (DRAVIDIANLANGTECH 2022), 2022, : 15 - 24
  • [38] Low-Resource Speech Recognition and Keyword-Spotting
    Gales, Mark J. F.
    Knill, Kate M.
    Ragni, Anton
    [J]. SPEECH AND COMPUTER, SPECOM 2017, 2017, 10458 : 3 - 19
  • [39] SEMI-SUPERVISED TRANSFER LEARNING FOR LANGUAGE EXPANSION OF END-TO-END SPEECH RECOGNITION MODELS TO LOW-RESOURCE LANGUAGES
    Kim, Jiyeon
    Kumar, Mehul
    Gowda, Dhananjaya
    Garg, Abhinav
    Kim, Chanwoo
    [J]. 2021 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2021, : 984 - 988
  • [40] Transfer Learning Based Free-Form Speech Command Classification for Low-Resource Languages
    Karunanayake, Yohan
    Thayasivam, Uthayasanker
    Ranathunga, Surangika
    [J]. 57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019:): STUDENT RESEARCH WORKSHOP, 2019, : 288 - 294