UNSUPERVISED POLYGLOT TEXT-TO-SPEECH

被引:0
|
作者
Nachmani, Eliya [1 ,2 ]
Wolf, Lior [1 ,2 ]
机构
[1] Facebook AI Res, Menlo Pk, CA 94025 USA
[2] Tel Aviv Univ, Tel Aviv, Israel
关键词
TTS; multilingual; unsupervised learning;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
We present a TTS neural network that is able to produce speech in multiple languages. The proposed network is able to transfer a voice, which was presented as a sample in a source language, into one of several target languages. Training is done without using matching or parallel data, i.e., without samples of the same speaker in multiple languages, making the method much more applicable. The conversion is based on learning a polyglot network that has multiple per-language sub-networks and adding loss terms that preserve the speaker's identity in multiple languages. We evaluate the proposed polyglot neural network for three languages with a total of more than 400 speakers and demonstrate convincing conversion capabilities.
引用
收藏
页码:7055 / 7059
页数:5
相关论文
共 50 条
  • [1] Text analysis and language identification for polyglot text-to-speech synthesis
    Romsdorfer, Harald
    Pfister, Beat
    [J]. SPEECH COMMUNICATION, 2007, 49 (09) : 697 - 724
  • [2] Unsupervised Text-to-Speech Synthesis by Unsupervised Automatic Speech Recognition
    Ni, Junrui
    Wang, Liming
    Gao, Heting
    Qian, Kaizhi
    Zhang, Yang
    Chang, Shiyu
    Hasegawa-Johnson, Mark
    [J]. INTERSPEECH 2022, 2022, : 461 - 465
  • [3] A Polyglot Domain Optimised Text-To-Speech System for Railway Station Announcements
    Zainko, Csaba
    Bartalis, Matyas
    Nemeth, Geza
    Olaszy, Gabor
    [J]. 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 1236 - 1240
  • [4] Database pruning for unsupervised building of text-to-speech voices
    Adell, Jordi
    Aguero, Pablo Daniel
    Bonafonte, Antonio
    [J]. 2006 IEEE International Conference on Acoustics, Speech and Signal Processing, Vols 1-13, 2006, : 889 - 892
  • [5] Mix and Match: An Empirical Study on Training Corpus Composition for Polyglot Text-To-Speech (TTS)
    Zhang, Ziyao
    Falai, Alessio
    Sanchez, Ariadna
    Angelini, Orazio
    Yanagisawa, Kayoko
    [J]. INTERSPEECH 2022, 2022, : 2353 - 2357
  • [6] Unify and Conquer: How Phonetic Feature Representation Affects Polyglot Text-To-Speech (TTS)
    Sanchez, Ariadna
    Falai, Alessio
    Zhang, Ziyao
    Angelini, Orazio
    Yanagisawa, Kayoko
    [J]. INTERSPEECH 2022, 2022, : 2963 - 2967
  • [7] Learning to Speak from Text: Zero-Shot Multilingual Text-to-Speech with Unsupervised Text Pretraining
    Saeki, Takaaki
    Maiti, Soumi
    Li, Xinjian
    Watanabe, Shinji
    Takamichi, Shinnosuke
    Saruwatari, Hiroshi
    [J]. PROCEEDINGS OF THE THIRTY-SECOND INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2023, 2023, : 5179 - 5187
  • [8] Software text-to-speech
    Hallahan W.I.
    [J]. International Journal of Speech Technology, 1997, 1 (2) : 121 - 134
  • [9] The Art of Text-to-Speech
    Lindquist, Benjamin
    [J]. CRITICAL INQUIRY, 2024, 50 (02) : 225 - 251
  • [10] TEXT-TO-SPEECH SYNTHESIS
    SPROAT, RW
    OLIVE, JP
    [J]. AT&T TECHNICAL JOURNAL, 1995, 74 (02): : 35 - 44