Pronunciation modeling for spontaneous speech recognition using latent pronunciation analysis (LPA) and prior knowledge

被引：0

作者：

Lin, Che-Kuang ^{[1
]}

Lee, Lin-Shan ^{[1
]}

机构：

[1] Natl Taiwan Univ, Taipei 10764, Taiwan

来源：

2007 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL IV, PTS 1-3 | 2007年

关键词：

pronunciation variation; spontaneous speech; speech recognition; probabilistic latent semantic analysis; distance metrics;

D O I：

暂无

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

In this paper, we propose a new framework for pronunciation modeling, in which the search algorithm tries to focus primarily on the clearly-pronounced portion of speech, while deemphasizing the observations of the slurred portion. This is based on the prior analysis that the pronunciation variation has to do with the predictability and the importance of the words in the spoken utterances, which may be estimated to some extent. We define a set of pronunciation-related features and develop a Latent Pronunciation Analysis (LPA) to estimate the "latent pronunciation states" in the speech. The LPA probabilities, pronunciation-related features and another set of prior knowledge obtained from two distance measures between phonemes are integrated in a SVM classifier to produce a "pronunciation variation indicator" for each frame, based on which the Viterbi decoding was performed. Very encouraging initial results on Mandarin spontaneous speech were obtained in preliminary experiments.

引用

页码：673 / +

页数：2

共 50 条

[1] Pronunciation Modeling for Spontaneous Mandarin Speech Recognition
Yi Liu
Pascale Fung
[J]. International Journal of Speech Technology, 2004, 7 (2-3) : 155 - 172
[2] Modeling partial pronunciation variations for spontaneous Mandarin speech recognition
Liu, Y
Fung, P
[J]. COMPUTER SPEECH AND LANGUAGE, 2003, 17 (04): : 357 - 379
[3] Modeling pronunciation variation for spontaneous speech synthesis
Werner, S
Wolff, M
Eichner, M
Hoffmann, R
[J]. 2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS: SPEECH PROCESSING, 2004, : 673 - 676
[4] Pronunciation Modeling for Dialectal Arabic Speech Recognition
Al-Haj, Hassan
Hsiao, Roger
Lane, Ian
Black, Alan W.
Waibel, Alex
[J]. 2009 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION & UNDERSTANDING (ASRU 2009), 2009, : 525 - 528
[5] Discriminative pronunciation modeling for dialectal speech recognition
Lehr, Maider
Gorman, Kyle
Shafran, Izhak
[J]. 15TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2014), VOLS 1-4, 2014, : 1458 - 1462
[6] Pronunciation ambiguity vs pronunciation variability in speech recognition
Saraçlar, M
Khudanpur, S
[J]. 2000 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, PROCEEDINGS, VOLS I-VI, 2000, : 1679 - 1682
[7] Production domain modeling of pronunciation for visual speech recognition
Saenko, K
Livescu, K
Glass, J
Darrell, T
[J]. 2005 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1-5: SPEECH PROCESSING, 2005, : 473 - 476
[8] A tutorial on pronunciation modeling for large vocabulary speech recognition
Fosler-Lussier, E
[J]. TEXT- AND SPEECH-TRIGGERED INFORMATION ACCESS, 2003, 2705 : 38 - 77
[9] State-dependent phonetic tied mixtures with pronunciation modeling for spontaneous speech recognition
Liu, Y
Fung, P
[J]. IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2004, 12 (04): : 351 - 364
[10] Statistical Transformation of Language and Pronunciation Models for Spontaneous Speech Recognition
Akita, Yuya
Kawahara, Tatsuya
[J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2010, 18 (06): : 1539 - 1549

← 1 2 3 4 5 →