Acoustic data-driven grapheme-to-phoneme conversion in the probabilistic lexical modeling framework

被引:12
|
作者
Razavi, Marzieh [1 ,2 ]
Rasipuram, Ramya [1 ]
Magimai-Doss, Mathew [1 ]
机构
[1] Idiap Res Inst, CH-1920 Martigny, Switzerland
[2] Ecole Polytech Fed Lausanne, CH-1015 Lausanne, Switzerland
关键词
Grapheme-to-phoneme conversion; Probabilistic lexical modeling framework; Kullback-Leibler divergence-based hidden Markov model; Automatic speech recognition; Lexicon development; SPEECH; ASR;
D O I
10.1016/j.specom.2016.03.003
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
One of the primary steps in building automatic speech recognition (ASR) and text-to-speech systems is the development of a phonemic lexicon that provides a mapping between each word and its pronunciation as a sequence of phonemes. Phoneme lexicons can be developed by humans through use of linguistic knowledge, however, this would be a costly and time-consuming task. To facilitate this process, grapheme-to phoneme conversion (G2P) techniques are used in which, given an initial phoneme lexicon, the relationship between graphemes and phonemes is learned through data-driven methods. This article presents a novel G2P formalism which learns the grapheme-to-phoneme relationship through acoustic data and potentially relaxes the need for an initial phonemic lexicon in the target language. The formalism involves a training part followed by an inference part. In the training part, the grapheme-to-phoneme relationship is captured in a probabilistic lexical modeling framework. In this framework, a hidden Markov model (HMM) is trained in which each HMM state representing a grapheme is parameterized by a categorical distribution of phonemes. Then in the inference part, given the orthographic transcription of the word and the learned HMM, the most probable sequence of phonemes is inferred. In this article, we show that the recently proposed acoustic G2P approach in the Kullback Leibler divergence-based HMM (KL-HMM) framework is a particular case of this formalism. We then benchmark the approach against two popular G2P approaches, namely joint multigram approach and decision tree-based approach. Our experimental studies on English and French show that despite relatively poor performance at the pronunciation level, the performance of the proposed approach is not significantly different than the state-of-the-art G2P methods at the ASR level. (C) 2016 Elsevier B.V. All rights reserved.
引用
收藏
页码:1 / 21
页数:21
相关论文
共 50 条
  • [11] Grapheme-to-phoneme conversion in Chinese TTS system
    Dong, HH
    Tao, JH
    Xu, B
    [J]. 2004 International Symposium on Chinese Spoken Language Processing, Proceedings, 2004, : 165 - 168
  • [12] Memory-based Data-driven Approach for Grapheme-to-Phoneme Conversion in Bengali Text-to-Speech Synthesis System
    Ghosh, Krishnendu
    Rao, K. Sreenivasa
    [J]. 2011 ANNUAL IEEE INDIA CONFERENCE (INDICON-2011): ENGINEERING SUSTAINABLE SOLUTIONS, 2011,
  • [13] Label Embedding for Chinese Grapheme-to-Phoneme Conversion
    Choi, Eunbi
    Kim, Hwa-Yeon
    Kim, Jong-Hwan
    Kim, Jae-Min
    [J]. INTERSPEECH 2021, 2021, : 4094 - 4098
  • [14] Automatic Grapheme-to-Phoneme Conversion of Arabic Text
    Al-Daradkah, Belal
    Al-Diri, Bashir
    [J]. 2015 SCIENCE AND INFORMATION CONFERENCE (SAI), 2015, : 468 - 473
  • [15] NARROWADAPTIVE REGULARIZATION OF WEIGHTS FOR GRAPHEME-TO-PHONEME CONVERSION
    Kubo, Keigo
    Sakti, Sakriani
    Neubig, Graham
    Toda, Tomoki
    Nakamura, Satoshi
    [J]. 2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
  • [16] Learning from Errors in Grapheme-to-Phoneme Conversion
    Polyakova, Tatyana
    Bonafonte, Antonio
    [J]. INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 2442 - 2445
  • [17] Online Discriminative Training for Grapheme-to-Phoneme Conversion
    Jiampojamarn, Sittichai
    Kondrak, Grzegorz
    [J]. INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 1307 - 1310
  • [18] Improved Grapheme-to-Phoneme Conversion for Mandarin TTS
    易立夫
    李健
    郝杰
    熊子瑜
    [J]. Tsinghua Science and Technology, 2009, 14 (05) : 606 - 611
  • [19] Automated Grapheme-to-Phoneme Conversion System for Romanian
    Jozsef, Domokos
    Ovidiu, Buza
    Gavril, Toderean
    [J]. 2011 6TH CONFERENCE ON SPEECH TECHNOLOGY AND HUMAN-COMPUTER DIALOGUE (SPED), 2011,
  • [20] MULTILINGUAL GRAPHEME-TO-PHONEME CONVERSION WITH BYTE REPRESENTATION
    Yu, Mingzhi
    Hieu Duy Nguyen
    Sokolov, Alex
    Lepird, Jack
    Sathyendra, Kanthashree Mysore
    Choudhary, Samridhi
    Mouchtaris, Athanasios
    Kunzmann, Siegfried
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 8234 - 8238