Comparison of Unsupervised Learning and Supervised Learning with Noisy Labels for Low-Resource Speech Recognition

被引:0
|
作者
Schraner, Yanick [1 ]
Scheller, Christian [1 ]
Pluess, Michel [1 ]
Neukom, Lukas [1 ]
Vogel, Manfred [1 ]
机构
[1] Univ Appl Sci & Arts Northwestern Switzerland, Windisch, Switzerland
来源
基金
瑞士国家科学基金会;
关键词
speech recognition; low-resource; speech translation; semi-supervised; self-supervised; forced alignment;
D O I
10.21437/Interspeech.2022-10620
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Supervised training of end-to-end speech recognition systems usually requires large amounts of transcribed speech data to achieve reasonable performance. This hinders its application to problems where the availability of annotated data is low since manual labeling is costly. Often, however, large amounts of speech data with imperfect transcriptions are available, which can be automatically aligned to generate noisy labels. In this work, we compare how supervised learning on noisy data from forced alignment compares to semi-supervised learning and self-supervised representation learning. The latter two have shown great success in improving speech recognition using unlabeled data. We employ noisy student training for semi-supervised learning and wav2vec 2.0 for self-supervised representation learning. We compare these methods on 2324 hours of Swiss German audio with automatically aligned Standard German text. Using speech data with noisy labels for supervised learning leads to a word error rate (WER) of 26.4% on our test set. Using the same data for wav2vec pretraining leads to a WER of 27.8%. With noisy student training, we achieve a WER of 30.3%.
引用
收藏
页码:4875 / 4879
页数:5
相关论文
共 50 条
  • [1] SUPERVISED AND UNSUPERVISED ACTIVE LEARNING FOR AUTOMATIC SPEECH RECOGNITION OF LOW-RESOURCE LANGUAGES
    Syed, Ali Raza
    Rosenberg, Andrew
    Kislal, Ellen
    [J]. 2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 5320 - 5324
  • [2] Low-resource Sinhala Speech Recognition using Deep Learning
    Karunathilaka, Hirunika
    Welgama, Viraj
    Nadungodage, Thilini
    Weerasinghe, Ruvan
    [J]. 2020 20TH INTERNATIONAL CONFERENCE ON ADVANCES IN ICT FOR EMERGING REGIONS (ICTER-2020), 2020, : 196 - 201
  • [3] Meta adversarial learning improves low-resource speech recognition
    Chen, Yaqi
    Yang, Xukui
    Zhang, Hao
    Zhang, Wenlin
    Qu, Dan
    Chen, Cong
    [J]. COMPUTER SPEECH AND LANGUAGE, 2024, 84
  • [4] META-LEARNING FOR LOW-RESOURCE SPEECH EMOTION RECOGNITION
    Chopra, Suransh
    Mathur, Puneet
    Sawhney, Ramit
    Shah, Rajiv Ratn
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6259 - 6263
  • [5] Multilingual Meta-Transfer Learning for Low-Resource Speech Recognition
    Zhou, Rui
    Koshikawa, Takaki
    Ito, Akinori
    Nose, Takashi
    Chen, Chia-Ping
    [J]. IEEE Access, 2024, 12 : 158493 - 158504
  • [6] A Method Improves Speech Recognition with Contrastive Learning in Low-Resource Languages
    Sun, Lixu
    Yolwas, Nurmemet
    Jiang, Lina
    [J]. APPLIED SCIENCES-BASEL, 2023, 13 (08):
  • [7] Multitask Learning of Deep Neural Networks for Low-Resource Speech Recognition
    Chen, Dongpeng
    Mak, Brian Kan-Wing
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2015, 23 (07) : 1172 - 1183
  • [8] Language-Adversarial Transfer Learning for Low-Resource Speech Recognition
    Yi, Jiangyan
    Tao, Jianhua
    Wen, Zhengqi
    Bai, Ye
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2019, 27 (03) : 621 - 630
  • [9] META LEARNING FOR END-TO-END LOW-RESOURCE SPEECH RECOGNITION
    Hsu, Jui-Yang
    Chen, Yuan-Jui
    Lee, Hung-yi
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 7844 - 7848
  • [10] Speech recognition using supervised and unsupervised learning techniques
    Singh, Amber
    Anand, R. S.
    [J]. 2015 INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND COMMUNICATION NETWORKS (CICN), 2015, : 691 - 696