SEMI-SUPERVISED TRAINING IN LOW-RESOURCE ASR AND KWS

被引:0
|
作者
Metze, Florian [1 ,2 ]
Gandhe, Ankur [1 ,2 ]
Miao, Yajie [1 ,2 ]
Sheikh, Zaid [1 ,2 ]
Wang, Yun [1 ,2 ]
Xu, Di [1 ,2 ]
Zhang, Hao [1 ,2 ]
Kim, Jungsuk [3 ,4 ]
Lane, Ian [3 ,4 ]
Lee, Won Kyum [3 ,4 ]
Stueker, Sebastian [5 ]
Mueller, Markus [5 ]
机构
[1] Carnegie Mellon Univ, Language Technol Inst, Pittsburgh, PA 15213 USA
[2] Carnegie Mellon Univ, Language Technol Inst, Moffett Field, CA USA
[3] Carnegie Mellon Univ, Dept Elect & Comp Engn, Pittsburgh, PA 15213 USA
[4] Carnegie Mellon Univ, Dept Elect & Comp Engn, Moffett Field, CA USA
[5] Karlsruhe Inst Technol, Karlsruhe, Germany
来源
2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP) | 2015年
基金
美国国家科学基金会;
关键词
spoken term detection; automatic speech recognition; low-resource LTs; semi-supervised training; RECOGNITION;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In particular for "low resource" Keyword Search (KWS) and Speech-to-Text (STT) tasks, more untranscribed test data may be available than training data. Several approaches have been proposed to make this data useful during system development, even when initial systems have Word Error Rates (WER) above 70%. In this paper, we present a set of experiments on low-resource languages in telephony speech quality in Assamese, Bengali, Lao, Haitian, Zulu, and Tamil, demonstrating the impact that such techniques can have, in particular learning robust bottle-neck features on the test data. In the case of Tamil, when significantly more test data than training data is available, we integrated semi-supervised training and speaker adaptation on the test data, and achieved significant additional improvements in STT and KWS.
引用
收藏
页码:4699 / 4703
页数:5
相关论文
共 50 条
  • [1] Large scale weakly and semi-supervised learning for low-resource video ASR
    Singh, Kritika
    Manohar, Vimal
    Xiao, Alex
    Edunov, Sergey
    Girshick, Ross
    Liptchinsky, Vitaliy
    Fuegen, Christian
    Saraf, Yatharth
    Zweig, Geoffrey
    Mohamed, Abdelrahman
    INTERSPEECH 2020, 2020, : 3770 - 3774
  • [2] Wav2vec-S: Semi-Supervised Pre-Training for Low-Resource ASR
    Zhu, Han
    Wang, Li
    Wang, Jindong
    Cheng, Gaofeng
    Zhang, Pengyuan
    Yan, Yonghong
    INTERSPEECH 2022, 2022, : 4870 - 4874
  • [3] Enhanced lstm network with semi-supervised learning and data augmentation for low-resource ASR
    Choudhary, Tripti
    Goyal, Vishal
    Bansal, Atul
    INTERNATIONAL JOURNAL ON SMART SENSING AND INTELLIGENT SYSTEMS, 2025, 18 (01):
  • [4] A Semi-Supervised Complementary Joint Training Approach for Low-Resource Speech Recognition
    Du, Ye-Qian
    Zhang, Jie
    Fang, Xin
    Wu, Ming-Hui
    Yang, Zhou-Wang
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2023, 31 : 3908 - 3921
  • [5] On the Learning Dynamics of Semi-Supervised Training for ASR
    Wallington, Electra
    Kershenbaum, Benji
    Klejch, Ondrej
    Bell, Peter
    INTERSPEECH 2021, 2021, : 716 - 720
  • [6] Improved low-resource Somali speech recognition by semi-supervised acoustic and language model training
    Biswas, Astik
    Menon, Raghav
    van der Westhuizen, Ewald
    Niesler, Thomas
    INTERSPEECH 2019, 2019, : 3008 - 3012
  • [7] IMPROVING SEMI-SUPERVISED CLASSIFICATION FOR LOW-RESOURCE SPEECH INTERACTION APPLICATIONS
    Kumar, Manoj
    Papadopoulos, Pavlos
    Travadi, Ruchir
    Bone, Daniel
    Narayanan, Shrikanth
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5149 - 5153
  • [8] Semi-Supervised Learning Based on Reference Model for Low-resource TTS
    Zhang, Xulong
    Wang, Jianzong
    Cheng, Ning
    Xiao, Jing
    2022 18TH INTERNATIONAL CONFERENCE ON MOBILITY, SENSING AND NETWORKING, MSN, 2022, : 966 - 971
  • [9] Semi-supervised DNN training with word selection for ASR
    Vesely, Karel
    Burget, Lukas
    Cernocky, Jan Honza
    18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 3687 - 3691
  • [10] Lightly supervised vs. semi-supervised training of acoustic model on Luxembourgish for low-resource automatic speech recognition
    Vesely, Karel
    Segura, Carlos
    Szoke, Igor
    Luque, Jordi
    Cernocky, Jan Honza
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 2883 - 2887