Solving the Phoneme Conflict in Grapheme-to-Phoneme Conversion Using a Two-Stage Neural Network-Based Approach

被引:5
|
作者
Kheang, Seng [1 ]
Katsurada, Kouichi [1 ]
Iribe, Yurie [2 ]
Nitta, Tsuneo [1 ,3 ]
机构
[1] Toyohashi Univ Technol, Toyohashi, Aichi 4418580, Japan
[2] Aichi Prefectural Univ, Nagakute, Aichi 4801198, Japan
[3] Waseda Univ, Tokyo 1698050, Japan
来源
关键词
two-stage neural network; grapheme-to-phoneme conversion; many-to-many mapping; prediction through phonemic information; phoneme conflict; SPEECH;
D O I
10.1587/transinf.E97.D.901
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
To achieve high quality output speech synthesis systems, data-driven grapheme-to-phoneme (G2P) conversion is usually used to generate the phonetic transcription of out-of-vocabulary (OOV) words. To improve the performance of G2P conversion, this paper deals with the problem of conflicting phonemes, where an input grapheme can, in the same context, produce many possible output phonemes at the same time. To this end, we propose a two-stage neural network-based approach that converts the input text to phoneme sequences in the first stage and then predicts each output phoneme in the second stage using the phonemic information obtained. The first-stage neural network is fundamentally implemented as a many-to-many mapping model for automatic conversion of word to phoneme sequences, while the second stage uses a combination of the obtained phoneme sequences to predict the output phoneme corresponding to each input grapheme in a given word. We evaluate the performance of this approach using the American English words-based pronunciation dictionary known as the auto-aligned CMUDict corpus [1]. In terms of phoneme and word accuracy of the OOV words, on comparison with several proposed baseline approaches, the evaluation results show that our proposed approach improves on the previous one-stage neural network-based approach for G2P conversion. The results of comparison with another existing approach indicate that it provides higher phoneme accuracy but lower word accuracy on a general dataset, and slightly higher phoneme and word accuracy on a selection of words consisting of more than one phoneme conflicts.
引用
收藏
页码:901 / 910
页数:10
相关论文
共 50 条
  • [1] Using Reversed Sequences and Grapheme Generation Rules to Extend the Feasibility of a Phoneme Transition Network-Based Grapheme-to-Phoneme Conversion
    Kheang, Seng
    Katsurada, Kouichi
    Iribe, Yurie
    Nitta, Tsuneo
    [J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2016, E99D (04): : 1182 - 1192
  • [2] New Grapheme Generation Rules for Two-Stage Model-based Grapheme-to-Phoneme Conversion
    Kheang, Seng
    Katsurada, Kouichi
    Iribe, Yurie
    Nitta, Tsuneo
    [J]. JOURNAL OF ICT RESEARCH AND APPLICATIONS, 2014, 8 (02) : 157 - 174
  • [3] Novel Two-Stage Model for Grapheme-to-Phoneme Conversion using New Grapheme Generation Rules
    Kheang, Seng
    Katsurada, Kouichi
    Iribe, Yurie
    Nitta, Tsuneo
    [J]. 2014 INTERNATIONAL CONFERENCE OF ADVANCED INFORMATICS: CONCEPT, THEORY AND APPLICATION (ICAICTA), 2014, : 97 - 102
  • [4] Grapheme-to-Phoneme Conversion with Convolutional Neural Networks
    Yolchuyeva, Sevinj
    Nemeth, Geza
    Gyires-Toth, Balint
    [J]. APPLIED SCIENCES-BASEL, 2019, 9 (06):
  • [5] Transformer based Grapheme-to-Phoneme Conversion
    Yolchuyeva, Sevinj
    Nemeth, Geza
    Gyires-Toth, Balint
    [J]. INTERSPEECH 2019, 2019, : 2095 - 2099
  • [6] Grapheme-to-Phoneme Conversion for Thai using Neural Regression Models
    Yamasaki, Tomohiro
    [J]. NAACL 2022: THE 2022 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES, 2022, : 4251 - 4255
  • [7] Neural Machine Translation for Multilingual Grapheme-to-Phoneme Conversion
    Sokolov, Alex
    Rohlin, Tracy
    Rastrow, Ariya
    [J]. INTERSPEECH 2019, 2019, : 2065 - 2069
  • [8] Efficient two-stage processing for joint sequence model-based Thai grapheme-to-phoneme conversion
    Rugchatjaroen, Anocha
    Saychum, Sittipong
    Kongyoung, Sarawoot
    Chootrakool, Patcharika
    Kasuriya, Sawit
    Wutiwiwatchai, Chai
    [J]. SPEECH COMMUNICATION, 2019, 106 : 105 - 111
  • [9] A linguistically motivated approach to grapheme-to-phoneme conversion for Korean
    Yoon, Kyuchul
    Brew, Chris
    [J]. COMPUTER SPEECH AND LANGUAGE, 2006, 20 (04): : 357 - 381
  • [10] Using Auto-Encoder BiLSTM Neural Network for Czech Grapheme-to-Phoneme Conversion
    Juzova, Marketa
    Vit, Jakub
    [J]. TEXT, SPEECH, AND DIALOGUE (TSD 2019), 2019, 11697 : 91 - 102