UNSUPERVISED POLYGLOT TEXT-TO-SPEECH

被引：0

作者：

Nachmani, Eliya ^{[1
,2
]}

Wolf, Lior ^{[1
,2
]}

机构：

[1] Facebook AI Res, Menlo Pk, CA 94025 USA

[2] Tel Aviv Univ, Tel Aviv, Israel

来源：

2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2019年

关键词：

TTS; multilingual; unsupervised learning;

D O I：

暂无

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

We present a TTS neural network that is able to produce speech in multiple languages. The proposed network is able to transfer a voice, which was presented as a sample in a source language, into one of several target languages. Training is done without using matching or parallel data, i.e., without samples of the same speaker in multiple languages, making the method much more applicable. The conversion is based on learning a polyglot network that has multiple per-language sub-networks and adding loss terms that preserve the speaker's identity in multiple languages. We evaluate the proposed polyglot neural network for three languages with a total of more than 400 speakers and demonstrate convincing conversion capabilities.

引用

页码：7055 / 7059

页数：5

共 50 条

[1] Text analysis and language identification for polyglot text-to-speech synthesis
Romsdorfer, Harald
Pfister, Beat
[J]. SPEECH COMMUNICATION, 2007, 49 (09) : 697 - 724
[2] Unsupervised Text-to-Speech Synthesis by Unsupervised Automatic Speech Recognition
Ni, Junrui
Wang, Liming
Gao, Heting
Qian, Kaizhi
Zhang, Yang
Chang, Shiyu
Hasegawa-Johnson, Mark
[J]. INTERSPEECH 2022, 2022, : 461 - 465
[3] A Polyglot Domain Optimised Text-To-Speech System for Railway Station Announcements
Zainko, Csaba
Bartalis, Matyas
Nemeth, Geza
Olaszy, Gabor
[J]. 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 1236 - 1240
[4] Database pruning for unsupervised building of text-to-speech voices
Adell, Jordi
Aguero, Pablo Daniel
Bonafonte, Antonio
[J]. 2006 IEEE International Conference on Acoustics, Speech and Signal Processing, Vols 1-13, 2006, : 889 - 892
[5] Mix and Match: An Empirical Study on Training Corpus Composition for Polyglot Text-To-Speech (TTS)
Zhang, Ziyao
Falai, Alessio
Sanchez, Ariadna
Angelini, Orazio
Yanagisawa, Kayoko
[J]. INTERSPEECH 2022, 2022, : 2353 - 2357
[6] Unify and Conquer: How Phonetic Feature Representation Affects Polyglot Text-To-Speech (TTS)
Sanchez, Ariadna
Falai, Alessio
Zhang, Ziyao
Angelini, Orazio
Yanagisawa, Kayoko
[J]. INTERSPEECH 2022, 2022, : 2963 - 2967
[7] Learning to Speak from Text: Zero-Shot Multilingual Text-to-Speech with Unsupervised Text Pretraining
Saeki, Takaaki
Maiti, Soumi
Li, Xinjian
Watanabe, Shinji
Takamichi, Shinnosuke
Saruwatari, Hiroshi
[J]. PROCEEDINGS OF THE THIRTY-SECOND INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2023, 2023, : 5179 - 5187
[8] Software text-to-speech
Hallahan W.I.
[J]. International Journal of Speech Technology, 1997, 1 (2) : 121 - 134
[9] The Art of Text-to-Speech
Lindquist, Benjamin
[J]. CRITICAL INQUIRY, 2024, 50 (02) : 225 - 251
[10] TEXT-TO-SPEECH SYNTHESIS
SPROAT, RW
OLIVE, JP
[J]. AT&T TECHNICAL JOURNAL, 1995, 74 (02): : 35 - 44

← 1 2 3 4 5 →