Pronunciation modeling for spontaneous speech recognition using latent pronunciation analysis (LPA) and prior knowledge

被引:0
|
作者
Lin, Che-Kuang [1 ]
Lee, Lin-Shan [1 ]
机构
[1] Natl Taiwan Univ, Taipei 10764, Taiwan
关键词
pronunciation variation; spontaneous speech; speech recognition; probabilistic latent semantic analysis; distance metrics;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In this paper, we propose a new framework for pronunciation modeling, in which the search algorithm tries to focus primarily on the clearly-pronounced portion of speech, while deemphasizing the observations of the slurred portion. This is based on the prior analysis that the pronunciation variation has to do with the predictability and the importance of the words in the spoken utterances, which may be estimated to some extent. We define a set of pronunciation-related features and develop a Latent Pronunciation Analysis (LPA) to estimate the "latent pronunciation states" in the speech. The LPA probabilities, pronunciation-related features and another set of prior knowledge obtained from two distance measures between phonemes are integrated in a SVM classifier to produce a "pronunciation variation indicator" for each frame, based on which the Viterbi decoding was performed. Very encouraging initial results on Mandarin spontaneous speech were obtained in preliminary experiments.
引用
收藏
页码:673 / +
页数:2
相关论文
共 50 条
  • [1] Pronunciation Modeling for Spontaneous Mandarin Speech Recognition
    Yi Liu
    Pascale Fung
    [J]. International Journal of Speech Technology, 2004, 7 (2-3) : 155 - 172
  • [2] Modeling partial pronunciation variations for spontaneous Mandarin speech recognition
    Liu, Y
    Fung, P
    [J]. COMPUTER SPEECH AND LANGUAGE, 2003, 17 (04): : 357 - 379
  • [3] Modeling pronunciation variation for spontaneous speech synthesis
    Werner, S
    Wolff, M
    Eichner, M
    Hoffmann, R
    [J]. 2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS: SPEECH PROCESSING, 2004, : 673 - 676
  • [4] Pronunciation Modeling for Dialectal Arabic Speech Recognition
    Al-Haj, Hassan
    Hsiao, Roger
    Lane, Ian
    Black, Alan W.
    Waibel, Alex
    [J]. 2009 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION & UNDERSTANDING (ASRU 2009), 2009, : 525 - 528
  • [5] Discriminative pronunciation modeling for dialectal speech recognition
    Lehr, Maider
    Gorman, Kyle
    Shafran, Izhak
    [J]. 15TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2014), VOLS 1-4, 2014, : 1458 - 1462
  • [6] Pronunciation ambiguity vs pronunciation variability in speech recognition
    Saraçlar, M
    Khudanpur, S
    [J]. 2000 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, PROCEEDINGS, VOLS I-VI, 2000, : 1679 - 1682
  • [7] Production domain modeling of pronunciation for visual speech recognition
    Saenko, K
    Livescu, K
    Glass, J
    Darrell, T
    [J]. 2005 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1-5: SPEECH PROCESSING, 2005, : 473 - 476
  • [8] A tutorial on pronunciation modeling for large vocabulary speech recognition
    Fosler-Lussier, E
    [J]. TEXT- AND SPEECH-TRIGGERED INFORMATION ACCESS, 2003, 2705 : 38 - 77
  • [9] State-dependent phonetic tied mixtures with pronunciation modeling for spontaneous speech recognition
    Liu, Y
    Fung, P
    [J]. IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2004, 12 (04): : 351 - 364
  • [10] Statistical Transformation of Language and Pronunciation Models for Spontaneous Speech Recognition
    Akita, Yuya
    Kawahara, Tatsuya
    [J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2010, 18 (06): : 1539 - 1549