JOINT ALIGNMENT LEARNING-ATTENTION BASED MODEL FOR GRAPHEME-TO-PHONEME CONVERSION

被引:2
|
作者
Wang, Yonghe
Bao, Feilong [1 ]
Zhang, Hui
Gao, Guanglai
机构
[1] Inner Mongolia Univ, Coll Comp Sci, Hohhot, Peoples R China
关键词
grapheme-to-phoneme conversion; alignment learning; attention; multitask learning; SEQUENCE MODELS;
D O I
10.1109/ICASSP39728.2021.9413679
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Sequence-to-sequence attention-based models for grapheme-to-phoneme (G2P) conversion have gained significant interests. The attention-based encoder-decoder framework learns the mapping of input to output tokens by selectively focusing on relevant information, and has been shown well performance. However, the attention mechanism can result in non-monotonic alignments, resulting in poor G2P conversion performance. In this paper, we present a novel approach to optimize the G2P conversion model directly alignment grapheme-phoneme sequence by using alignment learning (AL) as the loss function. Besides, we propose a multi-task learning method that uses a joint alignment learning model and attention model to predict the proper alignments and thus improve the accuracy of G2P conversion. Evaluations on Mongolian and CMUDict tasks show that alignment learning as the loss function can effectively train G2P conversion model. Further, our multi-task method can significantly outperform both the alignment learning-based model and attention-based model.
引用
收藏
页码:7788 / 7792
页数:5
相关论文
共 50 条
  • [41] NEURAL GRAPHEME-TO-PHONEME CONVERSION WITH PRE-TRAINED GRAPHEME MODELS
    Dong, Lu
    Guo, Zhi-Qiang
    Tan, Chao-Hong
    Hu, Ya-Jun
    Jiang, Yuan
    Ling, Zhen-Hua
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 6202 - 6206
  • [42] Grapheme-to-Phoneme Conversion using Conditional Random Fields
    Illina, Irina
    Fohr, Dominique
    Jouvet, Denis
    [J]. 12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 2324 - 2327
  • [43] A Maximum Entropy Approach to Chinese Grapheme-to-Phoneme Conversion
    Tsai, Richard Tzong-Han
    Wang, Yu-Chun
    [J]. PROCEEDINGS OF THE 2009 IEEE INTERNATIONAL CONFERENCE ON INFORMATION REUSE AND INTEGRATION, 2008, : 411 - +
  • [44] Multilingual grapheme-to-phoneme conversion with global character vectors
    Ni, Jinfu
    Shiga, Yoshinori
    Kawai, Hisashi
    [J]. 19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 2823 - 2827
  • [45] GRAPHEME-TO-PHONEME CONVERSION METHODS FOR MINORITY LANGUAGE CONDITIONS
    Cao, Mengxue
    Renals, Steve
    Bell, Peter
    Li, Aijun
    Fang, Qiang
    [J]. 2012 INTERNATIONAL CONFERENCE ON SPEECH DATABASE AND ASSESSMENTS, 2012, : 151 - 156
  • [46] Input Encoding for Sequence-to-Sequence Learning of Romanian Grapheme-to-Phoneme Conversion
    Stan, Adriana
    [J]. 2019 10TH INTERNATIONAL CONFERENCE ON SPEECH TECHNOLOGY AND HUMAN-COMPUTER DIALOGUE (SPED), 2019,
  • [47] Structured Adaptive Regularization of Weight Vectors for a Robust Grapheme-to-Phoneme Conversion Model
    Kubo, Keigo
    Sakti, Sakriani
    Neubig, Graham
    Toda, Tomoki
    Nakamura, Satoshi
    [J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2014, E97D (06): : 1468 - 1476
  • [48] Novel Two-Stage Model for Grapheme-to-Phoneme Conversion using New Grapheme Generation Rules
    Kheang, Seng
    Katsurada, Kouichi
    Iribe, Yurie
    Nitta, Tsuneo
    [J]. 2014 INTERNATIONAL CONFERENCE OF ADVANCED INFORMATICS: CONCEPT, THEORY AND APPLICATION (ICAICTA), 2014, : 97 - 102
  • [49] Grapheme-to-phoneme conversion based on a fast TBL algorithm in mandarin TTS systems
    Zheng, M
    Shi, Q
    Zhang, W
    Cai, LH
    [J]. FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY, PT 2, PROCEEDINGS, 2005, 3614 : 600 - 609
  • [50] Using Reversed Sequences and Grapheme Generation Rules to Extend the Feasibility of a Phoneme Transition Network-Based Grapheme-to-Phoneme Conversion
    Kheang, Seng
    Katsurada, Kouichi
    Iribe, Yurie
    Nitta, Tsuneo
    [J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2016, E99D (04): : 1182 - 1192