Comparison of Unsupervised Learning and Supervised Learning with Noisy Labels for Low-Resource Speech Recognition

被引:0
|
作者
Schraner, Yanick [1 ]
Scheller, Christian [1 ]
Pluess, Michel [1 ]
Neukom, Lukas [1 ]
Vogel, Manfred [1 ]
机构
[1] Univ Appl Sci & Arts Northwestern Switzerland, Windisch, Switzerland
来源
基金
瑞士国家科学基金会;
关键词
speech recognition; low-resource; speech translation; semi-supervised; self-supervised; forced alignment;
D O I
10.21437/Interspeech.2022-10620
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Supervised training of end-to-end speech recognition systems usually requires large amounts of transcribed speech data to achieve reasonable performance. This hinders its application to problems where the availability of annotated data is low since manual labeling is costly. Often, however, large amounts of speech data with imperfect transcriptions are available, which can be automatically aligned to generate noisy labels. In this work, we compare how supervised learning on noisy data from forced alignment compares to semi-supervised learning and self-supervised representation learning. The latter two have shown great success in improving speech recognition using unlabeled data. We employ noisy student training for semi-supervised learning and wav2vec 2.0 for self-supervised representation learning. We compare these methods on 2324 hours of Swiss German audio with automatically aligned Standard German text. Using speech data with noisy labels for supervised learning leads to a word error rate (WER) of 26.4% on our test set. Using the same data for wav2vec pretraining leads to a WER of 27.8%. With noisy student training, we achieve a WER of 30.3%.
引用
收藏
页码:4875 / 4879
页数:5
相关论文
共 50 条
  • [41] Exploring low-resource medical image classification with weakly supervised prompt learning
    Zheng, Fudan
    Cao, Jindong
    Yu, Weijiang
    Chen, Zhiguang
    Xiao, Nong
    Lu, Yutong
    [J]. PATTERN RECOGNITION, 2024, 149
  • [42] Improving Automatic Speech Recognition Performance for Low-Resource Languages With Self-Supervised Models
    Zhao, Jing
    Zhang, Wei-Qiang
    [J]. IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2022, 16 (06) : 1227 - 1241
  • [43] Semi-Supervised Learning Based on Reference Model for Low-resource TTS
    Zhang, Xulong
    Wang, Jianzong
    Cheng, Ning
    Xiao, Jing
    [J]. 2022 18TH INTERNATIONAL CONFERENCE ON MOBILITY, SENSING AND NETWORKING, MSN, 2022, : 966 - 971
  • [44] IFF-WAV2VEC: Noise Robust Low-Resource Speech Recognition Based on Self-supervised Learning and Interactive Feature Fusion
    Cao, Jing
    Qian, Zhaopeng
    Yu, Chongchong
    Xie, Tao
    [J]. PROCEEDINGS OF 2023 6TH ARTIFICIAL INTELLIGENCE AND CLOUD COMPUTING CONFERENCE, AICCC 2023, 2023, : 232 - 237
  • [45] Generalizable Low-Resource Activity Recognition with Diverse and Discriminative Representation Learning
    Qin, Xin
    Wang, Jindong
    Ma, Shuo
    Lu, Wang
    Zhu, Yongchun
    Xie, Xing
    Chen, Yiqiang
    [J]. PROCEEDINGS OF THE 29TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, KDD 2023, 2023, : 1943 - 1953
  • [46] DEEP MAXOUT NETWORKS FOR LOW-RESOURCE SPEECH RECOGNITION
    Miao, Yajie
    Metze, Florian
    Rawat, Shourabh
    [J]. 2013 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING (ASRU), 2013, : 398 - 403
  • [47] Frontier Research on Low-Resource Speech Recognition Technology
    Slam, Wushour
    Li, Yanan
    Urouvas, Nurmamet
    [J]. SENSORS, 2023, 23 (22)
  • [48] Optimizing Data Usage for Low-Resource Speech Recognition
    Qian, Yanmin
    Zhou, Zhikai
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2022, 30 : 394 - 403
  • [49] ADVERSARIAL MULTILINGUAL TRAINING FOR LOW-RESOURCE SPEECH RECOGNITION
    Yi, Jiangyan
    Tao, Jianhua
    Wen, Zhengqi
    Bai, Ye
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 4899 - 4903
  • [50] LRSpeech: Extremely Low-Resource Speech Synthesis and Recognition
    Xu, Jin
    Tan, Xu
    Ren, Yi
    Qin, Tao
    Li, Jian
    Zhao, Sheng
    Liu, Tie-Yan
    [J]. KDD '20: PROCEEDINGS OF THE 26TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING, 2020, : 2802 - 2812