Weighted Cross-entropy for Low-Resource Languages in Multilingual Speech Recognition

被引:0
|
作者
Pineiro-Martin, Andres [1 ,2 ]
Garcia-Mateo, Carmen [1 ]
Docio-Fernandez, Laura [1 ]
Del Carmen Lopez-Perez, Maria [2 ]
Rehm, Georg [3 ]
机构
[1] Univ Vigo, AtlanTTic Res Ctr, GTM Res Grp, Vigo, Spain
[2] Balidea Consulting & Programming SL, Santiago De Compostela, Spain
[3] DFKI GmbH, Speech & Language Technol Lab, Berlin, Germany
来源
关键词
Continual multilingual learning; automatic speech recognition; weighted cross-entropy; low-resource language; DATA AUGMENTATION;
D O I
10.21437/Interspeech.2024-734
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper addresses the challenge of integrating low-resource languages into multilingual automatic speech recognition (ASR) systems. We introduce a novel application of weighted cross-entropy, typically used for unbalanced datasets, to facilitate the integration of low-resource languages into pre-trained multilingual ASR models within the context of continual multilingual learning. We fine-tune the Whisper multilingual ASR model on five high-resource languages and one low-resource language, employing language-weighted dynamic cross-entropy and data augmentation. The results show a remarkable 6.69% word error rate (WER) reduction for the low-resource language compared to the fine-tuned model without applying our approach, and a 48.86% WER reduction compared to the original Whisper model. In addition, our approach yields an average WER reduction of 3.29% across the six languages, showing no degradation for the high-resource languages.
引用
收藏
页码:1235 / 1239
页数:5
相关论文
共 50 条
  • [31] Cross-Lingual Language Modeling for Low-Resource Speech Recognition
    Xu, Ping
    Fung, Pascale
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2013, 21 (06): : 1134 - 1144
  • [32] Exploiting Adapters for Cross-Lingual Low-Resource Speech Recognition
    Hou, Wenxin
    Zhu, Han
    Wang, Yidong
    Wang, Jindong
    Qin, Tao
    Xu, Renju
    Shinozaki, Takahiro
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2022, 30 : 317 - 329
  • [33] AUTOMATIC RATING OF SPONTANEOUS SPEECH FOR LOW-RESOURCE LANGUAGES
    Al-Ghezi, Ragheb
    Getman, Yaroslav
    Voskoboinik, Ekaterina
    Singh, Mittul
    Kurimo, Mikko
    2022 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, SLT, 2022, : 339 - 345
  • [34] CURRICULUM OPTIMIZATION FOR LOW-RESOURCE SPEECH RECOGNITION
    Kuznetsova, Anastasia
    Kumar, Anurag
    Fox, Jennifer Drexler
    Tyers, Francis M.
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 8187 - 8191
  • [35] Enrollment in low-resource speech recognition systems
    Deligne, S
    Dharanipragada, S
    2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS: SPEECH PROCESSING, 2004, : 341 - 344
  • [36] An Analysis of Massively Multilingual Neural Machine Translation for Low-Resource Languages
    Mueller, Aaron
    Nicolai, Garrett
    McCarthy, Arya D.
    Lewis, Dylan
    Wu, Winston
    Yarowsky, David
    PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), 2020, : 3710 - 3718
  • [37] Cross-lingual offensive speech identification with transfer learning for low-resource languages
    Shi, Xiayang
    Liu, Xinyi
    Xu, Chun
    Huang, Yuanyuan
    Chen, Fang
    Zhu, Shaolin
    COMPUTERS & ELECTRICAL ENGINEERING, 2022, 101
  • [38] Multilingual Features Based Keyword Search for Very Low-Resource Languages
    Golik, Pavel
    Tueske, Zoltan
    Schlueter, Ralf
    Ney, Hermann
    16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 1260 - 1264
  • [39] KNOWLEDGE DISTILLATION ACROSS ENSEMBLES OF MULTILINGUAL MODELS FOR LOW-RESOURCE LANGUAGES
    Cui, Jia
    Kingsbury, Brian
    Ramabhadran, Bhuvana
    Saon, George
    Sercu, Tom
    Audhkhasi, Kartik
    Sethy, Abhinav
    Nussbaum-Thom, Markus
    Rosenberg, Andrew
    2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 4825 - 4829
  • [40] Multilingual unsupervised sequence segmentation transfers to extremely low-resource languages
    Downey, C. M.
    Drizin, Shannon
    Haroutunian, Levon
    Thukral, Shivin
    PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), VOL 1: (LONG PAPERS), 2022, : 5331 - 5346