Comparison of Unsupervised Learning and Supervised Learning with Noisy Labels for Low-Resource Speech Recognition

被引：0

作者：

Schraner, Yanick ^{[1
]}

Scheller, Christian ^{[1
]}

Pluess, Michel ^{[1
]}

Neukom, Lukas ^{[1
]}

Vogel, Manfred ^{[1
]}

机构：

[1] Univ Appl Sci & Arts Northwestern Switzerland, Windisch, Switzerland

来源：

INTERSPEECH 2022 | 2022年

基金：

瑞士国家科学基金会;

关键词：

speech recognition; low-resource; speech translation; semi-supervised; self-supervised; forced alignment;

D O I：

10.21437/Interspeech.2022-10620

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Supervised training of end-to-end speech recognition systems usually requires large amounts of transcribed speech data to achieve reasonable performance. This hinders its application to problems where the availability of annotated data is low since manual labeling is costly. Often, however, large amounts of speech data with imperfect transcriptions are available, which can be automatically aligned to generate noisy labels. In this work, we compare how supervised learning on noisy data from forced alignment compares to semi-supervised learning and self-supervised representation learning. The latter two have shown great success in improving speech recognition using unlabeled data. We employ noisy student training for semi-supervised learning and wav2vec 2.0 for self-supervised representation learning. We compare these methods on 2324 hours of Swiss German audio with automatically aligned Standard German text. Using speech data with noisy labels for supervised learning leads to a word error rate (WER) of 26.4% on our test set. Using the same data for wav2vec pretraining leads to a WER of 27.8%. With noisy student training, we achieve a WER of 30.3%.

引用

页码：4875 / 4879

页数：5

共 50 条

[41] Exploring low-resource medical image classification with weakly supervised prompt learning
Zheng, Fudan
Cao, Jindong
Yu, Weijiang
Chen, Zhiguang
Xiao, Nong
Lu, Yutong
[J]. PATTERN RECOGNITION, 2024, 149
[42] Improving Automatic Speech Recognition Performance for Low-Resource Languages With Self-Supervised Models
Zhao, Jing
Zhang, Wei-Qiang
[J]. IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2022, 16 (06) : 1227 - 1241
[43] Semi-Supervised Learning Based on Reference Model for Low-resource TTS
Zhang, Xulong
Wang, Jianzong
Cheng, Ning
Xiao, Jing
[J]. 2022 18TH INTERNATIONAL CONFERENCE ON MOBILITY, SENSING AND NETWORKING, MSN, 2022, : 966 - 971
[44] IFF-WAV2VEC: Noise Robust Low-Resource Speech Recognition Based on Self-supervised Learning and Interactive Feature Fusion
Cao, Jing
Qian, Zhaopeng
Yu, Chongchong
Xie, Tao
[J]. PROCEEDINGS OF 2023 6TH ARTIFICIAL INTELLIGENCE AND CLOUD COMPUTING CONFERENCE, AICCC 2023, 2023, : 232 - 237
[45] Generalizable Low-Resource Activity Recognition with Diverse and Discriminative Representation Learning
Qin, Xin
Wang, Jindong
Ma, Shuo
Lu, Wang
Zhu, Yongchun
Xie, Xing
Chen, Yiqiang
[J]. PROCEEDINGS OF THE 29TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, KDD 2023, 2023, : 1943 - 1953
[46] DEEP MAXOUT NETWORKS FOR LOW-RESOURCE SPEECH RECOGNITION
Miao, Yajie
Metze, Florian
Rawat, Shourabh
[J]. 2013 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING (ASRU), 2013, : 398 - 403
[47] Frontier Research on Low-Resource Speech Recognition Technology
Slam, Wushour
Li, Yanan
Urouvas, Nurmamet
[J]. SENSORS, 2023, 23 (22)
[48] Optimizing Data Usage for Low-Resource Speech Recognition
Qian, Yanmin
Zhou, Zhikai
[J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2022, 30 : 394 - 403
[49] ADVERSARIAL MULTILINGUAL TRAINING FOR LOW-RESOURCE SPEECH RECOGNITION
Yi, Jiangyan
Tao, Jianhua
Wen, Zhengqi
Bai, Ye
[J]. 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 4899 - 4903
[50] LRSpeech: Extremely Low-Resource Speech Synthesis and Recognition
Xu, Jin
Tan, Xu
Ren, Yi
Qin, Tao
Li, Jian
Zhao, Sheng
Liu, Tie-Yan
[J]. KDD '20: PROCEEDINGS OF THE 26TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING, 2020, : 2802 - 2812

← 1 2 3 4 5 →