Speech recognition and utterance verification based on a generalized confidence score

被引:28
|
作者
Koo, MW [1 ]
Lee, CH [1 ]
Juang, BH [1 ]
机构
[1] Korea Telecom, Spoken Language Res Team, Multimedia Technol Lab, Seoul 137792, South Korea
来源
关键词
confidence score; speech recognition; utterance verification;
D O I
10.1109/89.966085
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In this paper, we introduce a generalized confidence score (GCS) function that enables a framework to integrate different confidence scores in speech recognition and utterance verification. A modified decoder based on the GCS is then proposed. The GCS is defined as a combination of various confidence scores obtained by exponential weighting from various confidence information sources, such as likelihood, likelihood ratio, duration, language model probabilities, etc. We also propose the use of a confidence preprocessor to transform raw scores into manageable terms for easy integration. We consider two kinds of hybrid decoders, an ordinary hybrid decoder and an extended hybrid decoder, as implementation examples based on the generalized confidence score. The ordinary hybrid decoder uses a frame-level likelihood ratio in addition to a frame-level likelihood, while a conventional decoder uses only the frame likelihood or likelihood ratio. The extended hybrid decoder uses not only the frame-level likelihood but also multilevel information such as frame-level, phone-level, and word-level confidence scores based on the likelihood ratios. Our experimental evaluation shows that the proposed hybrid decoders give better results than those obtained by the conventional decoders, especially in dealing with ill-formed utterances that contain out-of-vocabulary words and phrases.
引用
收藏
页码:821 / 832
页数:12
相关论文
共 50 条
  • [31] Correcting recognition errors via discriminative utterance verification
    Setlur, AR
    Sukkar, RA
    Jacob, J
    [J]. ICSLP 96 - FOURTH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, VOLS 1-4, 1996, : 602 - 605
  • [32] A study on robust utterance verification for connected digits recognition
    Rahim, MG
    Lee, CH
    Juang, BH
    [J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1997, 101 (05): : 2892 - 2902
  • [33] Estimating confidence measures for speech recognition verification using a smoothed naive Bayes model
    Sanchis, A
    Juan, A
    Vidal, E
    [J]. PATTERN RECOGNITION AND IMAGE ANALYSIS, PROCEEDINGS, 2003, 2652 : 910 - 918
  • [34] Verification of speech recognition results incorporating in-domain confidence and discourse coherence measures
    Lane, IR
    Kawahara, T
    [J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2006, E89D (03): : 931 - 938
  • [35] HISTORY UTTERANCE EMBEDDING TRANSFORMER LM FOR SPEECH RECOGNITION
    Deng, Keqi
    Cheng, Gaofeng
    Miao, Haoran
    Zhang, Pengyuan
    Yan, Yonghong
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 5914 - 5918
  • [36] DISCRETE UTTERANCE SPEECH RECOGNITION WITHOUT TIME ALIGNMENT
    SHORE, JE
    BURTON, DK
    [J]. IEEE TRANSACTIONS ON INFORMATION THEORY, 1983, 29 (04) : 473 - 491
  • [37] PREDICTING SPEECH RECOGNITION CONFIDENCE USING DEEP LEARNING WITH WORD IDENTITY AND SCORE FEATURES
    Huang, Po-Sen
    Kumar, Kshitiz
    Liu, Chaojun
    Gong, Yifan
    Deng, Li
    [J]. 2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 7413 - 7417
  • [38] Multistage Utterance Verification for Keyword Recognition-based Online Spoken Content Retrieval
    Park, Jeong-Sik
    Jang, Gil-Jin
    Kim, Ji-Hwan
    [J]. IEEE TRANSACTIONS ON CONSUMER ELECTRONICS, 2012, 58 (03) : 1000 - 1005
  • [39] Utterance Verification-Based Dysarthric Speech Intelligibility Assessment Using Phonetic Posterior Features
    Fritsch, Julian
    Magimai-Doss, Mathew
    [J]. IEEE SIGNAL PROCESSING LETTERS, 2021, 28 : 224 - 228
  • [40] Calibration of Confidence Measures in Speech Recognition
    Yu, Dong
    Li, Jinyu
    Deng, Li
    [J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2011, 19 (08): : 2461 - 2473