Semi-supervised and unsupervised discriminative language model training for automatic speech recognition

被引:6
|
作者
Dikici, Erinc [1 ]
Saraclar, Murat [1 ]
机构
[1] Bogazici Univ, Dept Elect & Elect Engn, TR-34342 Istanbul, Turkey
关键词
Discriminative language modeling; Semi-supervised training; Unsupervised training; CLASSIFICATION; RERANKING; RANKING;
D O I
10.1016/j.specom.2016.07.004
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Discriminative language modeling aims to reduce the error rates by rescoring the output of an automatic speech recognition (ASR) system. Discriminative language model (DLM) training conventionally follows a supervised approach, using acoustic recordings together with their manual transcriptions (reference) as training data, and the recognition performance is improved with increasing amount of such matched data. In this study we investigate the case where matched data for DLM training is limited or is not available at all, and explore methods to improve ASR accuracy by incorporating acoustic and text data that come from separate sources. For semi-supervised training, we utilize a confusion model to generate artificial hypotheses instead of the real ASR N-bests. For unsupervised training, we propose three target output selection methods to take over the missing reference. We handle this task both as a structured prediction and a reranking problem and employ two different variants of the WER-sensitive perceptron algorithm. We show that significant improvement over baseline ASR accuracy is obtained even when there is no transcribed acoustic data available to train the DLM. (C) 2016 Elsevier B.V. All rights reserved.
引用
收藏
页码:54 / 63
页数:10
相关论文
共 50 条
  • [21] NoRefER: a Referenceless Quality Metric for Automatic Speech Recognition via Semi-Supervised Language Model Fine-Tuning with Contrastive Learning
    Yuksel, Kamer Ali
    Ferreira, Thiago Castro
    Javadi, Golara
    Al-Badrashiny, Mohamed
    Gunduz, Ahmet
    [J]. INTERSPEECH 2023, 2023, : 466 - 470
  • [22] Semi-supervised acoustic model training for speech with code-switching
    Yilmaz, Emre
    McLaren, Mitchell
    van den Heuvel, Henk
    van Leeuwen, David A.
    [J]. SPEECH COMMUNICATION, 2018, 105 : 12 - 22
  • [23] A Hierarchical Bayesian Approach for Semi-supervised Discriminative Language Modeling
    Tam, Yik-Cheung
    Vozila, Paul
    [J]. 13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 186 - 189
  • [24] Discriminative training of language models for speech recognition
    Kuo, KHJ
    Fosler-Lussier, E
    Jiang, H
    Lee, CH
    [J]. 2002 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-IV, PROCEEDINGS, 2002, : 325 - 328
  • [25] SEMI-SUPERVISED SPOKEN LANGUAGE UNDERSTANDING VIA SELF-SUPERVISED SPEECH AND LANGUAGE MODEL PRETRAINING
    Lai, Cheng-, I
    Chuang, Yung-Sung
    Lee, Hung-Yi
    Li, Shang-Wen
    Glass, James
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 7468 - 7472
  • [26] Acoustic model training using committee-based active and semi-supervised learning for speech recognition
    Tsutaoka, Takuya
    Shinoda, Koichi
    [J]. 2012 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2012,
  • [27] SEMI-SUPERVISED DNN TRAINING IN MEETING RECOGNITION
    Zhang, Pengyuan
    Liu, Yulan
    Hain, Thomas
    [J]. 2014 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY SLT 2014, 2014, : 141 - 146
  • [28] COMBINING UNSUPERVISED AND TEXT AUGMENTED SEMI-SUPERVISED LEARNING FOR LOW RESOURCED AUTOREGRESSIVE SPEECH RECOGNITION
    Li, Chak-Fai
    Keith, Francis
    Hartmann, William
    Snover, Matthew
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 6892 - 6896
  • [29] Grammar-Based Semi-Supervised Incremental Learning in Automatic Speech Recognition and Labeling
    Li, Haifeng
    Zhang, Tian
    Qiu, Rongfa
    Ma, Lin
    [J]. 2012 INTERNATIONAL CONFERENCE ON FUTURE ELECTRICAL POWER AND ENERGY SYSTEM, PT B, 2012, 17 : 1843 - 1849
  • [30] A semi-supervised mixture model of visual language multitask for vehicle recognition
    Liu, Wenjin
    Zhang, Shudong
    Zhou, Lijuan
    Luo, Ning
    Xu, Min
    [J]. APPLIED SOFT COMPUTING, 2024, 159