Utterance Verification Using Word Voiceprint Models Based on Probabilistic Distributions of Phone-Level Log-Likelihood Ratio and Phone Duration

被引:0
|
作者
Kwon, Suk-Bong [1 ]
Kim, HoiRin [1 ]
机构
[1] Informat & Commun Univ, Taejon, South Korea
关键词
utterance verification; confidence measure; likelihood ratio testing; word voiceprint;
D O I
10.1093/ietisy/e91-d.11.2746
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper suggests word voiceprint models to verify the recognition results obtained from a speech recognition system. Word voiceprint models have word-dependent information based on the distributions of phone-level log-likelihood ratio and duration. Thus, we can obtain a more reliable confidence score for a recognized word by using its word voiceprint models that represent the more proper characteristics of utterance verification for the word. Additionally, when obtaining a log-likelihood ratio-based word voiceprint score, this paper proposes a new log-scale normalization function using the distribution of the phone-level log-likelihood ratio, instead of the sigmoid function widely used in obtaining a phone-level log-likelihood ratio. This function plays a role of emphasizing a mis-recognized phone in a word. This individual information of a word is used to help achieve a more discriminative score against out-of vocabulary words. The proposed method requires additional memory, but it shows that the relative reduction in equal error rate is 16.9% compared to the baseline system using simple phone log-likelihood ratios.
引用
收藏
页码:2746 / 2750
页数:5
相关论文
共 3 条
  • [1] Utterance verification based on statistics of phone-level confidence scores
    Sankar, A
    Wu, SL
    [J]. 2003 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS: SPEECH PROCESSING I, 2003, : 584 - 587
  • [2] Utterance Verification Using State-Level Log-Likelihood Ratio with Frame and State Selection
    Kwon, Suk-Bong
    Kim, Hoirin
    [J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2010, E93D (03) : 647 - 650
  • [3] EXTENDED PHONE LOG-LIKELIHOOD RATIO FEATURES AND ACOUSTIC-BASED I-VECTORS FOR LANGUAGE RECOGNITION
    D'Haro, L. F.
    Cordoba, R.
    Salamea, C.
    Echeverry, J. D.
    [J]. 2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,