End-to-end Tibetan Speech Synthesis Based on Phones and Semi-syllables

被引:0
|
作者
Li, Guanyu [1 ]
Luo, Lisai [1 ]
Gong, Chunwei [1 ]
Lv, Shiliang [1 ]
机构
[1] Northwest Minzu Univ, Minist Educ, Key Lab Chinas Ethn Languages & Informat Technol, Lanzhou 730000, Gansu, Peoples R China
关键词
D O I
暂无
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Due to the 2D architecture of Tibetan characters, it is not convenient to treat the letters sequences as the input of the end-to-end speech synthesis system. The experiments are conducted based on phones and semi-syllables sequences respectively. In training and testing, the text is segmented into a sequence of syllables first, then syllables are transformed into phones and semi-syllables as the input sequence of the model. The results demonstrate the encoding and decoding alignment effect of Tibetan speech synthesis based on phones is better than that based on semi-syllables. In addition, the Highway network in the architecture plays a key role in the convergence of the model.
引用
收藏
页码:1294 / 1297
页数:4
相关论文
共 50 条
  • [1] End-to-End Speech Synthesis for Tibetan Multidialect
    Xu, Xiaona
    Yang, Li
    Zhao, Yue
    Wang, Hui
    [J]. COMPLEXITY, 2021, 2021
  • [2] End-to-end Speech Synthesis for Tibetan Lhasa Dialect
    Luo, Lisai
    Li, Guanyu
    Gong, Chunwei
    Ding, Hailan
    [J]. 2018 INTERNATIONAL SYMPOSIUM ON POWER ELECTRONICS AND CONTROL ENGINEERING (ISPECE 2018), 2019, 1187
  • [3] Lhasa-Tibetan Speech Synthesis Using End-to-End Model
    Zhao, Yue
    Hu, Panhua
    Xu, Xiaona
    Wu, Licheng
    Li, Xiali
    [J]. IEEE ACCESS, 2019, 7 : 140305 - 140311
  • [4] Tibetan-Mandarin Bilingual Speech Recognition Based on End-to-End Framework
    Wang, Qingnan
    Guo, Wu
    Chen, Peixin
    Song, Yan
    [J]. 2017 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC 2017), 2017, : 1214 - 1217
  • [5] End-to-End Amdo-Tibetan Speech Recognition Based on Knowledge Transfer
    Zhu, Xiaojun
    Huang, Heming
    [J]. IEEE ACCESS, 2020, 8 : 170991 - 171000
  • [6] SEMI-SUPERVISED LEARNING BASED ON HIERARCHICAL GENERATIVE MODELS FOR END-TO-END SPEECH SYNTHESIS
    Fujimoto, Takato
    Takaki, Shinji
    Hashimoto, Kei
    Oura, Keiichiro
    Nankaku, Yoshihiko
    Tokuda, Keiichi
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 7644 - 7648
  • [7] End-to-End Binaural Speech Synthesis
    Huang, Wen-Chin
    Markovic, Dejan
    Gebru, Israel D.
    Menon, Anjali
    Richard, Alexander
    [J]. INTERSPEECH 2022, 2022, : 1218 - 1222
  • [8] IMPROVING UNSUPERVISED STYLE TRANSFER IN END-TO-END SPEECH SYNTHESIS WITH END-TO-END SPEECH RECOGNITION
    Liu, Da-Rong
    Yang, Chi-Yu
    Wu, Szu-Lin
    Lee, Hung-Yi
    [J]. 2018 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2018), 2018, : 640 - 647
  • [9] Emphatic Speech Synthesis and Control Based on Characteristic Transferring in End-to-End Speech Synthesis
    Wang, Mu
    Wu, Zhiyong
    Wu, Xixin
    Meng, Helen
    Kang, Shiyin
    Jia, Jia
    Cai, Lianhong
    [J]. 2018 FIRST ASIAN CONFERENCE ON AFFECTIVE COMPUTING AND INTELLIGENT INTERACTION (ACII ASIA), 2018,
  • [10] Semi-Supervised End-to-End Speech Recognition
    Karita, Shigeki
    Watanabe, Shinji
    Iwata, Tomoharu
    Ogawa, Atsunori
    Delcroix, Marc
    [J]. 19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 2 - 6