End-to-end text-to-speech synthesis with unaligned multiple language units based on attention

被引：2

作者：

Aso, Masashi ^{[1
]}

Takamichi, Shinnosuke ^{[1
]}

Saruwatari, Hiroshi ^{[1
]}

机构：

[1] Univ Tokyo, Grad Sch Informat Sci & Technol, Tokyo, Japan

来源：

INTERSPEECH 2020 | 2020年

关键词：

End-to-end; Text-to-speech; Subword; Progressive training; Transformer;

D O I：

10.21437/Interspeech.2020-2347

中图分类号：

R36 [病理学]; R76 [耳鼻咽喉科学];

学科分类号：

100104 ; 100213 ;

摘要：

This paper presents the use of unaligned multiple language units for end-to-end text-to-speech (TTS). End-to-end TTS is a promising technology in that it does not require intermediate representation such as prosodic contexts. However, it causes mispronunciation and unnatural prosody. To alleviate this problem, previous methods have used multiple language units, e.g., phonemes and characters, but required the units to be hard-aligned. In this paper, we propose a multi-input attention structure that simultaneously accepts multiple language units without alignments among them. We consider using not only traditional phonemes and characters but also subwords tokenized in a language-independent manner. We also propose a progressive training strategy to deal with the unaligned multiple language units. The experimental results demonstrated that our model and training strategy improve speech quality.

引用

页码：4009 / 4013

页数：5

共 50 条

[1] EXPLORING END-TO-END NEURAL TEXT-TO-SPEECH SYNTHESIS FOR ROMANIAN
Dumitrache, Marius
Rebedea, Traian
PROCEEDINGS OF THE 15TH INTERNATIONAL CONFERENCE LINGUISTIC RESOURCES AND TOOLS FOR NATURAL LANGUAGE PROCESSING, 2020, : 93 - 102
[2] Myanmar Text-to-Speech Synthesis Using End-to-End Model
Qin, Qinglai
Yang, Jian
Li, Peiying
2020 4TH INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING AND INFORMATION RETRIEVAL, NLPIR 2020, 2020, : 6 - 11
[3] End-to-End Mongolian Text-to-Speech System
Li, Jingdong
Zhang, Hui
Liu, Rui
Zhang, Xueliang
Bao, Feilong
2018 11TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2018, : 483 - 487
[4] Improving transfer of expressivity for end-to-end multispeaker text-to-speech synthesis
Kulkarni, Ajinkya
Colotte, Vincent
Jouvet, Denis
29TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO 2021), 2021, : 31 - 35
[5] Knowledge-based Linguistic Encoding for End-to-End Mandarin Text-to-Speech Synthesis
Li, Jingbei
Wu, Zhiyong
Li, Runnan
Zhi, Pengpeng
Yang, Song
Meng, Helen
INTERSPEECH 2019, 2019, : 4494 - 4498
[6] End-to-End Thai Text-to-Speech with Linguistic Unit
Wisetpaitoon, Kontawat
Singkul, Sattaya
Sakdejayont, Theerat
Chalothorn, Tawunrat
PROCEEDINGS OF THE 4TH ANNUAL ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, ICMR 2024, 2024, : 951 - 959
[7] NaturalSpeech: End-to-End Text-to-Speech Synthesis With Human-Level Quality
Tan, Xu
Chen, Jiawei
Liu, Haohe
Cong, Jian
Zhang, Chen
Liu, Yanqing
Wang, Xi
Leng, Yichong
Yi, Yuanhao
He, Lei
Zhao, Sheng
Qin, Tao
Soong, Frank
Liu, Tie-Yan
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2024, 46 (06) : 4234 - 4245
[8] End-to-End Text-To-Speech synthesis for under resourced South African languages
Nthite, Thapelo
Tsoeu, Mohohlo
2020 INTERNATIONAL SAUPEC/ROBMECH/PRASA CONFERENCE, 2020, : 684 - 689
[9] Central Kurdish Text-to-Speech Synthesis with Novel End-to-End Transformer Training
Ahmad, Hawraz A.
Rashid, Tarik A.
ALGORITHMS, 2024, 17 (07)
[10] EfficientTTS 2: Variational End-to-End Text-to-Speech Synthesis and Voice Conversion
Miao, Chenfeng
Zhu, Qingying
Chen, Minchuan
Ma, Jun
Wang, Shaojun
Xiao, Jing
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 1650 - 1661

← 1 2 3 4 5 →