An Unsupervised Method to Select a Speaker Subset from Large Multi-Speaker Speech Synthesis Datasets

被引:2
|
作者
Gallegos, Pilar Oplustil [1 ]
Williams, Jennifer [1 ]
Rownicka, Joanna [1 ]
King, Simon [1 ]
机构
[1] Univ Edinburgh, Ctr Speech Technol Res, Edinburgh, Midlothian, Scotland
来源
基金
英国工程与自然科学研究理事会;
关键词
speech synthesis; data; clustering; speaker representation; sequence-to-sequence models; multi-speaker;
D O I
10.21437/Interspeech.2020-2567
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
Large multi-speaker datasets for TTS typically contain diverse speakers, recording conditions, styles and quality of data. Although one might generally presume that more data is better, in this paper we show that a model trained on a carefully-chosen subset of speakers from LibriTTS provides significantly better quality synthetic speech than a model trained on a larger set. We propose an unsupervised methodology to find this subset by clustering per-speaker acoustic representations.
引用
收藏
页码:1758 / 1762
页数:5
相关论文
共 50 条
  • [21] MULTI-SPEAKER, NARROWBAND, CONTINUOUS MARATHI SPEECH DATABASE
    Godambe, Tejas
    Bondale, Nandini
    Samudravijaya, K.
    Rao, Preeti
    [J]. 2013 INTERNATIONAL CONFERENCE ORIENTAL COCOSDA HELD JOINTLY WITH 2013 CONFERENCE ON ASIAN SPOKEN LANGUAGE RESEARCH AND EVALUATION (O-COCOSDA/CASLRE), 2013,
  • [22] Speech Recognition and Multi-Speaker Diarization of Long Conversations
    Mao, Huanru Henry
    Li, Shuyang
    McAuley, Julian
    Cottrell, Garrison W.
    [J]. INTERSPEECH 2020, 2020, : 691 - 695
  • [23] Multi-speaker Emotional Text-to-speech Synthesizer
    Cho, Sungjae
    Lee, Soo-Young
    [J]. INTERSPEECH 2021, 2021, : 2337 - 2338
  • [24] Multi-speaker Text-to-speech Synthesis Using Deep Gaussian Processes
    Mitsui, Kentaro
    Koriyama, Tomoki
    Saruwatari, Hiroshi
    [J]. INTERSPEECH 2020, 2020, : 2032 - 2036
  • [25] MULTI-SPEAKER EMOTIONAL SPEECH SYNTHESIS WITH FINE-GRAINED PROSODY MODELING
    Lu, Chunhui
    Wen, Xue
    Liu, Ruolan
    Chen, Xiao
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 5729 - 5733
  • [26] GANSpeech: Adversarial Training for High-Fidelity Multi-Speaker Speech Synthesis
    Yang, Jinhyeok
    Bae, Jae-Sung
    Bak, Taejun
    Kim, Young-Ik
    Cho, Hoon-Young
    [J]. INTERSPEECH 2021, 2021, : 2202 - 2206
  • [27] Autoregressive multi-speaker model in Chinese speech synthesis based on variational autoencoder
    Hao, Xiaoyang
    Zhang, Pengyuan
    [J]. Shengxue Xuebao/Acta Acustica, 2022, 47 (03): : 405 - 416
  • [28] J-MAC: Japanese multi-speaker audiobook corpus for speech synthesis
    Takamichi, Shinnosuke
    Nakata, Wataru
    Tanji, Naoko
    Saruwatari, Hiroshi
    [J]. INTERSPEECH 2022, 2022, : 2358 - 2362
  • [29] MULTI-SPEAKER EMOTIONAL ACOUSTIC MODELING FOR CNN-BASED SPEECH SYNTHESIS
    Choi, Heejin
    Park, Sangjun
    Park, Jinuk
    Hahn, Minsoo
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6950 - 6954
  • [30] J-MAC: Japanese multi-speaker audiobook corpus for speech synthesis
    Takamichi, Shinnosuke
    Nakata, Wataru
    Tanji, Naoko
    Saruwatari, Hiroshi
    [J]. Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, 2022, 2022-September : 2358 - 2362