New Grapheme Generation Rules for Two-Stage Model-based Grapheme-to-Phoneme Conversion

被引:2
|
作者
Kheang, Seng [1 ]
Katsurada, Kouichi [1 ]
Iribe, Yurie [2 ]
Nitta, Tsuneo [1 ,3 ]
机构
[1] Toyohashi Univ Technol, 1-1 Tempaku, Toyohashi, Aichi 4418580, Japan
[2] Aichi Prefectural Univ, Nagakute, Aichi 4801198, Japan
[3] Waseda Univ, Shinjuku Ku, Tokyo 1698050, Japan
关键词
grapheme generation rules (GGR); combined grapheme-phoneme information; two-stage model; grapheme-to-phoneme (G2P); automatic text-to-phonetic transcription;
D O I
10.5614/itbj.ict.res.appl.2014.8.2.6
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The precise conversion of arbitrary text into its corresponding phoneme sequence (grapheme-to-phoneme or G2P conversion) is implemented in speech synthesis and recognition, pronunciation learning software, spoken term detection and spoken document retrieval systems. Because the quality of this module plays an important role in the performance of such systems and many problems regarding G2P conversion have been reported, we propose a novel two-stage model-based approach, which is implemented using an existing weighted finite-state transducer-based G2P conversion framework, to improve the performance of the G2P conversion model. The first-stage model is built for automatic conversion of words to phonemes, while the second-stage model utilizes the input graphemes and output phonemes obtained from the first stage to determine the best final output phoneme sequence. Additionally, we designed new grapheme generation rules, which enable extra detail for the vowel and consonant graphemes appearing within a word. When compared with previous approaches, the evaluation results indicate that our approach using rules focusing on the vowel graphemes slightly improved the accuracy of the out-of-vocabulary dataset and consistently increased the accuracy of the in-vocabulary dataset.
引用
收藏
页码:157 / 174
页数:18
相关论文
共 50 条
  • [41] Grapheme-to-Phoneme Conversion using Conditional Random Fields
    Illina, Irina
    Fohr, Dominique
    Jouvet, Denis
    [J]. 12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 2324 - 2327
  • [42] ByT5 model for massively multilingual grapheme-to-phoneme conversion
    Zhu, Jian
    Zhang, Cong
    Jurgens, David
    [J]. INTERSPEECH 2022, 2022, : 446 - 450
  • [43] Multilingual grapheme-to-phoneme conversion with global character vectors
    Ni, Jinfu
    Shiga, Yoshinori
    Kawai, Hisashi
    [J]. 19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 2823 - 2827
  • [44] GRAPHEME-TO-PHONEME CONVERSION METHODS FOR MINORITY LANGUAGE CONDITIONS
    Cao, Mengxue
    Renals, Steve
    Bell, Peter
    Li, Aijun
    Fang, Qiang
    [J]. 2012 INTERNATIONAL CONFERENCE ON SPEECH DATABASE AND ASSESSMENTS, 2012, : 151 - 156
  • [45] Automated grapheme-to-phoneme conversion for Central Kurdish based on optimality theory
    Mahmudi, Aso
    Veisi, Hadi
    [J]. COMPUTER SPEECH AND LANGUAGE, 2021, 70
  • [46] AN ALTERNATIVE TO GRAPHEME-PHONEME CONVERSION RULES
    TAFT, M
    [J]. MEMORY & COGNITION, 1982, 10 (05) : 465 - 474
  • [47] Grapheme-to-Phoneme Conversion Using Automatically Extracted Associative Rules for Korean TTS System
    Lee, Jinsik
    Kim, Seungwon
    Lee, Gary Geunbae
    [J]. INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 1264 - 1267
  • [48] Structured Adaptive Regularization of Weight Vectors for a Robust Grapheme-to-Phoneme Conversion Model
    Kubo, Keigo
    Sakti, Sakriani
    Neubig, Graham
    Toda, Tomoki
    Nakamura, Satoshi
    [J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2014, E97D (06): : 1468 - 1476
  • [49] Compression of exception lexicons for small footprint grapheme-to-phoneme conversion
    Meron, J
    Veprek, P
    [J]. 2005 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1-5: SPEECH PROCESSING, 2005, : 293 - 296
  • [50] Conditional Random Fields for the Tunisian Dialect Grapheme-to-Phoneme Conversion
    Masmoudi, Abir
    Ellouze, Mariem
    Bougares, Fethi
    Esetye, Yannick
    Belguith, Lamia
    [J]. 17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 1457 - 1461