Semi-Supervised Training of Language Model on Spanish Conversational Telephone Speech Data

被引:3
|
作者
Egorova, Ekaterina [1 ,2 ,3 ]
Luque Serrano, Jordi [1 ]
机构
[1] Telefon Res, Edificio Telefon Diagonal, Barcelona 08019, Spain
[2] Brno Univ Technol, Speech FIT, Brno 61200, Czech Republic
[3] Brno Univ Technol, Ctr Excellence IT41, Brno 61200, Czech Republic
关键词
Speech recognition; language modeling; semi-supervised learning;
D O I
10.1016/j.procs.2016.04.038
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
This work addresses one of the common issues arising when building a speech recognition system within a low-resourced scenario - adapting the language model on unlabeled audio data. The proposed methodology makes use of such data by means of semi-supervised learning. Whilst it has been proven that adding system-generated labeled data for acoustic modeling yields good results, the benefits of adding system-generated sentence hypotheses to the language model are vaguer in the literature. This investigation focuses on the latter by exploring different criteria for picking valuable, well-transcribed sentences. These criteria range from confidence measures at word and sentence level to sentence duration metrics and grammatical structure frequencies. The processing pipeline starts with training a seed speech recognizer using only twenty hours of Fisher Spanish phone call conversations corpus. The proposed procedure attempts to augment this initial system by supplementing it with transcriptions generated automatically from unlabeled data with the use of the seed system. After generating these transcriptions, it is estimated how likely they are, and only the ones with high scores are added to the training data. Experimental results show improvements gained by the use of an augmented language model. Although these improvements are still lesser than those obtained from a system with only acoustic model augmentation, we consider the proposed system ( with its low cost in terms of computational resources and the ability for task adaptation) an attractive technique worthy of further exploration. (C) 2016 The Authors. Published by Elsevier B.V.
引用
收藏
页码:114 / 120
页数:7
相关论文
共 50 条
  • [1] Semi-supervised and unsupervised discriminative language model training for automatic speech recognition
    Dikici, Erinc
    Saraclar, Murat
    [J]. SPEECH COMMUNICATION, 2016, 83 : 54 - 63
  • [2] LANGUAGE DIARIZATION FOR SEMI-SUPERVISED BILINGUAL ACOUSTIC MODEL TRAINING
    Yilmaz, Emre
    McLaren, Mitchell
    van den Heuvel, Henk
    van Leeuwen, David A.
    [J]. 2017 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2017, : 91 - 96
  • [3] Semi-supervised acoustic model training for speech with code-switching
    Yilmaz, Emre
    McLaren, Mitchell
    van den Heuvel, Henk
    van Leeuwen, David A.
    [J]. SPEECH COMMUNICATION, 2018, 105 : 12 - 22
  • [4] Improved low-resource Somali speech recognition by semi-supervised acoustic and language model training
    Biswas, Astik
    Menon, Raghav
    van der Westhuizen, Ewald
    Niesler, Thomas
    [J]. INTERSPEECH 2019, 2019, : 3008 - 3012
  • [5] SEMI-SUPERVISED SPOKEN LANGUAGE UNDERSTANDING VIA SELF-SUPERVISED SPEECH AND LANGUAGE MODEL PRETRAINING
    Lai, Cheng-, I
    Chuang, Yung-Sung
    Lee, Hung-Yi
    Li, Shang-Wen
    Glass, James
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 7468 - 7472
  • [6] Multimodal Semi-supervised Learning Framework for Punctuation Prediction in Conversational Speech
    Sunkara, Monica
    Ronanki, Srikanth
    Bekal, Dhanush
    Bodapati, Sravan
    Kirchhoff, Katrin
    [J]. INTERSPEECH 2020, 2020, : 4911 - 4915
  • [7] Semi-supervised Model for Emotion Recognition in Speech
    Pereira, Ingryd
    Santos, Diego
    Maciel, Alexandre
    Barros, Pablo
    [J]. ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2018, PT I, 2018, 11139 : 791 - 800
  • [8] Semi-Supervised Speech Recognition Acoustic Model Training Using Policy Gradient
    Chung, Hoon
    Lee, Sung Joo
    Jeon, Hyeong Bae
    Park, Jeon Gue
    [J]. APPLIED SCIENCES-BASEL, 2020, 10 (10):
  • [9] Lithuanian Broadcast Speech Transcription using Semi-supervised Acoustic Model Training
    Lileikyte, Rasa
    Gorin, Arseniy
    Lamel, Lori
    Gauvain, Jean-Luc
    Fraga-Silva, Thiago
    [J]. SLTU-2016 5TH WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGIES FOR UNDER-RESOURCED LANGUAGES, 2016, 81 : 107 - 113
  • [10] Comparing Self-Supervised Pre-Training and Semi-Supervised Training for Speech Recognition in Languages with Weak Language Models
    Lam-Yee-Mui, Lea-Marie
    Yang, Lucas Ondel
    Klejch, Ondrej
    [J]. INTERSPEECH 2023, 2023, : 87 - 91