End-to-end text-to-speech synthesis with unaligned multiple language units based on attention

被引:2
|
作者
Aso, Masashi [1 ]
Takamichi, Shinnosuke [1 ]
Saruwatari, Hiroshi [1 ]
机构
[1] Univ Tokyo, Grad Sch Informat Sci & Technol, Tokyo, Japan
来源
关键词
End-to-end; Text-to-speech; Subword; Progressive training; Transformer;
D O I
10.21437/Interspeech.2020-2347
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
This paper presents the use of unaligned multiple language units for end-to-end text-to-speech (TTS). End-to-end TTS is a promising technology in that it does not require intermediate representation such as prosodic contexts. However, it causes mispronunciation and unnatural prosody. To alleviate this problem, previous methods have used multiple language units, e.g., phonemes and characters, but required the units to be hard-aligned. In this paper, we propose a multi-input attention structure that simultaneously accepts multiple language units without alignments among them. We consider using not only traditional phonemes and characters but also subwords tokenized in a language-independent manner. We also propose a progressive training strategy to deal with the unaligned multiple language units. The experimental results demonstrated that our model and training strategy improve speech quality.
引用
收藏
页码:4009 / 4013
页数:5
相关论文
共 50 条
  • [11] Adaptive End-to-End Text-to-Speech Synthesis Based on Error Correction Feedback from Humans
    Fujii, Kazuki
    Saito, Yuki
    Saruwatari, Hiroshi
    PROCEEDINGS OF 2022 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2022, : 1702 - 1707
  • [12] Adaptive End-to-End Text-to-Speech Synthesis Based on Error Correction Feedback from Humans
    Fujii, Kazuki
    Saito, Yuki
    Saruwatari, Hiroshi
    Proceedings of 2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2022, 2022, : 1702 - 1707
  • [13] Effective Emotion Transplantation in an End-to-End Text-to-Speech System
    Joo, Young-Sun
    Bae, Hanbin
    Kim, Young-Ik
    Cho, Hoon-Young
    Kang, Hong-Goo
    IEEE ACCESS, 2020, 8 : 161713 - 161719
  • [14] FPETS : Fully Parallel End-to-End Text-to-Speech System
    Ma, Dabiao
    Su, Zhiba
    Wang, Wenxuan
    Lu, Yuhao
    THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 8457 - 8463
  • [15] Myanmar Text-to-Speech System based on Tacotron (End-to-End Generative Model)
    Win, Yuzana
    Lwin, Htoo Pyae
    Masada, Tomonari
    11TH INTERNATIONAL CONFERENCE ON ICT CONVERGENCE: DATA, NETWORK, AND AI IN THE AGE OF UNTACT (ICTC 2020), 2020, : 572 - 577
  • [16] Multi speaker text-to-speech synthesis using generalized end-to-end loss function
    Nazir, Owais
    Malik, Aruna
    Singh, Samayveer
    Pathan, Al-Sakib Khan
    MULTIMEDIA TOOLS AND APPLICATIONS, 2024, 83 (24) : 64205 - 64222
  • [17] Improvement of the end-to-end scene text recognition method for "text-to-speech" conversion
    Makhmudov, Fazliddin
    Mukhiddinov, Mukhriddin
    Abdusalomov, Akmalbek
    Avazov, Kuldoshbay
    Khamdamov, Utkir
    Cho, Young Im
    INTERNATIONAL JOURNAL OF WAVELETS MULTIRESOLUTION AND INFORMATION PROCESSING, 2020, 18 (06)
  • [18] WAVE-TACOTRON: SPECTROGRAM-FREE END-TO-END TEXT-TO-SPEECH SYNTHESIS
    Weiss, Ron J.
    Skerry-Ryan, R. J.
    Battenberg, Eric
    Mariooryad, Soroosh
    Kingma, Diederik P.
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 5679 - 5683
  • [19] END-TO-END TEXT-TO-SPEECH USING LATENT DURATION BASED ON VQ-VAE
    Yasuda, Yusuke
    Wang, Xin
    Yamagishi, Junichi
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 5694 - 5698
  • [20] Investigation of Input Alphabets of End-to-End Lithuanian Text-to-Speech Synthesizer
    Kasparaitis, Pijus
    Antanavicius, Danielius
    BALTIC JOURNAL OF MODERN COMPUTING, 2023, 11 (02): : 285 - 296