Acoustic data-driven grapheme-to-phoneme conversion in the probabilistic lexical modeling framework

被引：12

作者：

Razavi, Marzieh ^{[1
,2
]}

Rasipuram, Ramya ^{[1
]}

Magimai-Doss, Mathew ^{[1
]}

机构：

[1] Idiap Res Inst, CH-1920 Martigny, Switzerland

[2] Ecole Polytech Fed Lausanne, CH-1015 Lausanne, Switzerland

来源：

SPEECH COMMUNICATION | 2016年 / 80卷

关键词：

Grapheme-to-phoneme conversion; Probabilistic lexical modeling framework; Kullback-Leibler divergence-based hidden Markov model; Automatic speech recognition; Lexicon development; SPEECH; ASR;

D O I：

10.1016/j.specom.2016.03.003

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

One of the primary steps in building automatic speech recognition (ASR) and text-to-speech systems is the development of a phonemic lexicon that provides a mapping between each word and its pronunciation as a sequence of phonemes. Phoneme lexicons can be developed by humans through use of linguistic knowledge, however, this would be a costly and time-consuming task. To facilitate this process, grapheme-to phoneme conversion (G2P) techniques are used in which, given an initial phoneme lexicon, the relationship between graphemes and phonemes is learned through data-driven methods. This article presents a novel G2P formalism which learns the grapheme-to-phoneme relationship through acoustic data and potentially relaxes the need for an initial phonemic lexicon in the target language. The formalism involves a training part followed by an inference part. In the training part, the grapheme-to-phoneme relationship is captured in a probabilistic lexical modeling framework. In this framework, a hidden Markov model (HMM) is trained in which each HMM state representing a grapheme is parameterized by a categorical distribution of phonemes. Then in the inference part, given the orthographic transcription of the word and the learned HMM, the most probable sequence of phonemes is inferred. In this article, we show that the recently proposed acoustic G2P approach in the Kullback Leibler divergence-based HMM (KL-HMM) framework is a particular case of this formalism. We then benchmark the approach against two popular G2P approaches, namely joint multigram approach and decision tree-based approach. Our experimental studies on English and French show that despite relatively poor performance at the pronunciation level, the performance of the proposed approach is not significantly different than the state-of-the-art G2P methods at the ASR level. (C) 2016 Elsevier B.V. All rights reserved.

引用

页码：1 / 21

页数：21

共 50 条

[31] Incorporating syllabification points into a model of grapheme-to-phoneme conversion
Suyanto, Suyanto
INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2019, 22 (02) : 459 - 470
[32] A Maximum Entropy Approach to Chinese Grapheme-to-Phoneme Conversion
Tsai, Richard Tzong-Han
Wang, Yu-Chun
PROCEEDINGS OF THE 2009 IEEE INTERNATIONAL CONFERENCE ON INFORMATION REUSE AND INTEGRATION, 2008, : 411 - +
[33] A Rule-Based Grapheme-to-Phoneme Conversion System
Klosowski, Piotr
APPLIED SCIENCES-BASEL, 2022, 12 (05):
[34] Incorporating syllabification points into a model of grapheme-to-phoneme conversion
Suyanto Suyanto
International Journal of Speech Technology, 2019, 22 : 459 - 470
[35] Example-Based Grapheme-to-Phoneme Conversion for Thai
Charoenpornsawat, Paisarn
Schultz, Tanja
INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 1268 - 1271
[36] Multilingual grapheme-to-phoneme conversion with global character vectors
Ni, Jinfu
Shiga, Yoshinori
Kawai, Hisashi
19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 2823 - 2827
[37] GRAPHEME-TO-PHONEME CONVERSION METHODS FOR MINORITY LANGUAGE CONDITIONS
Cao, Mengxue
Renals, Steve
Bell, Peter
Li, Aijun
Fang, Qiang
2012 INTERNATIONAL CONFERENCE ON SPEECH DATABASE AND ASSESSMENTS, 2012, : 151 - 156
[38] Grapheme-to-phoneme Conversion based on Adaptive Regularization of Weight Vectors
Kubo, Keigo
Sakti, Sakriani
Neubig, Graham
Toda, Tomoki
Nakamura, Satoshi
14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 1945 - 1949
[39] On Training and Evaluation of Grapheme-to-Phoneme Mappings with Limited Data
Sharma, Dravyansh
19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 2858 - 2862
[40] Conditional Random Fields for the Tunisian Dialect Grapheme-to-Phoneme Conversion
Masmoudi, Abir
Ellouze, Mariem
Bougares, Fethi
Esetye, Yannick
Belguith, Lamia
17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 1457 - 1461

← 1 2 3 4 5 →