Acoustic data-driven grapheme-to-phoneme conversion in the probabilistic lexical modeling framework

被引:12
|
作者
Razavi, Marzieh [1 ,2 ]
Rasipuram, Ramya [1 ]
Magimai-Doss, Mathew [1 ]
机构
[1] Idiap Res Inst, CH-1920 Martigny, Switzerland
[2] Ecole Polytech Fed Lausanne, CH-1015 Lausanne, Switzerland
关键词
Grapheme-to-phoneme conversion; Probabilistic lexical modeling framework; Kullback-Leibler divergence-based hidden Markov model; Automatic speech recognition; Lexicon development; SPEECH; ASR;
D O I
10.1016/j.specom.2016.03.003
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
One of the primary steps in building automatic speech recognition (ASR) and text-to-speech systems is the development of a phonemic lexicon that provides a mapping between each word and its pronunciation as a sequence of phonemes. Phoneme lexicons can be developed by humans through use of linguistic knowledge, however, this would be a costly and time-consuming task. To facilitate this process, grapheme-to phoneme conversion (G2P) techniques are used in which, given an initial phoneme lexicon, the relationship between graphemes and phonemes is learned through data-driven methods. This article presents a novel G2P formalism which learns the grapheme-to-phoneme relationship through acoustic data and potentially relaxes the need for an initial phonemic lexicon in the target language. The formalism involves a training part followed by an inference part. In the training part, the grapheme-to-phoneme relationship is captured in a probabilistic lexical modeling framework. In this framework, a hidden Markov model (HMM) is trained in which each HMM state representing a grapheme is parameterized by a categorical distribution of phonemes. Then in the inference part, given the orthographic transcription of the word and the learned HMM, the most probable sequence of phonemes is inferred. In this article, we show that the recently proposed acoustic G2P approach in the Kullback Leibler divergence-based HMM (KL-HMM) framework is a particular case of this formalism. We then benchmark the approach against two popular G2P approaches, namely joint multigram approach and decision tree-based approach. Our experimental studies on English and French show that despite relatively poor performance at the pronunciation level, the performance of the proposed approach is not significantly different than the state-of-the-art G2P methods at the ASR level. (C) 2016 Elsevier B.V. All rights reserved.
引用
收藏
页码:1 / 21
页数:21
相关论文
共 50 条
  • [41] Token-Level Ensemble Distillation for Grapheme-to-Phoneme Conversion
    Sun, Hao
    Tan, Xu
    Gan, Jun-Wei
    Liu, Hongzhi
    Zhao, Sheng
    Qin, Tao
    Liu, Tie-Yan
    INTERSPEECH 2019, 2019, : 2115 - 2119
  • [42] Compression of exception lexicons for small footprint grapheme-to-phoneme conversion
    Meron, J
    Veprek, P
    2005 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1-5: SPEECH PROCESSING, 2005, : 293 - 296
  • [43] The generation of letter-to-sound rules for grapheme-to-phoneme conversion
    Przybysz, Pawel
    Kasprzak, Wlodzimierz
    2013 6TH INTERNATIONAL CONFERENCE ON HUMAN SYSTEM INTERACTIONS (HSI), 2013, : 292 - 297
  • [44] Multitask Sequence-to-Sequence Models for Grapheme-to-Phoneme Conversion
    Milde, Benjamin
    Schmidt, Christoph
    Koehler, Joachim
    18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 2536 - 2540
  • [45] An evaluation of non-standard features for grapheme-to-phoneme conversion
    Webster, Gabriel
    Braunschweiler, Norbert
    INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 1845 - 1848
  • [46] BAYESIAN JOINT-SEQUENCE MODELS FOR GRAPHEME-TO-PHONEME CONVERSION
    Hannemann, Mirko
    Trmal, Jan
    Ondel, Lucas
    Kesiraju, Santosh
    Burget, Lukas
    2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 2836 - 2840
  • [47] The SIGMORPHON 2020 Shared Task on Multilingual Grapheme-to-Phoneme Conversion
    Gorman, Kyle
    Ashby, Lucas F. E.
    Goyzueta, Aaron
    McCarthy, Arya D.
    Wu, Shijie
    You, Daniel
    17TH SIGMORPHON WORKSHOP ON COMPUTATIONAL RESEARCH IN PHONETICS PHONOLOGY, AND MORPHOLOGY (SIGMORPHON 2020), 2020, : 40 - 50
  • [48] Grapheme-to-Phoneme Conversion for Thai using Neural Regression Models
    Yamasaki, Tomohiro
    NAACL 2022: THE 2022 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES, 2022, : 4251 - 4255
  • [49] Phonetisaurus: Exploring grapheme-to-phoneme conversion with joint n-gram models in the WFST framework
    Novak, Josef Robert
    Minematsu, Nobuaki
    Hirose, Keikichi
    NATURAL LANGUAGE ENGINEERING, 2016, 22 (06) : 907 - 938
  • [50] Phonological or procedural dyslexia: Specific deficit of complex grapheme-to-phoneme conversion
    Macoir, Joel
    Fossard, Marion
    Saint-Pierre, Marie-Catherine
    Auclair-Ouellet, Noemie
    JOURNAL OF NEUROLINGUISTICS, 2012, 25 (03) : 163 - 177