LibriTTS-R: A Restored Multi-Speaker Text-to-Speech Corpus

被引:5
|
作者
Koizumi, Yuma [1 ]
Zen, Heiga [1 ]
Karita, Shigeki [1 ]
Ding, Yifan [1 ]
Yatabe, Kohei [2 ]
Morioka, Nobuyuki [1 ]
Bacchiani, Michiel [1 ]
Zhang, Yu [3 ]
Han, Wei [3 ]
Bapna, Ankur [3 ]
机构
[1] Google, Tokyo, Japan
[2] Tokyo Univ Agr Technol, Tokyo, Japan
[3] Google, Mountain View, CA USA
来源
关键词
Text-to-speech; dataset; speech restoration;
D O I
10.21437/Interspeech.2023-1584
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This paper introduces a new speech dataset called "LibriTTS-R" designed for text-to-speech (TTS) use. It is derived by applying speech restoration to the LibriTTS corpus, which consists of 585 hours of speech data at 24 kHz sampling rate from 2,456 speakers and the corresponding texts. The constituent samples of LibriTTS-R are identical to those of LibriTTS, with only the sound quality improved. Experimental results show that the LibriTTS-R ground-truth samples showed significantly improved sound quality compared to those in LibriTTS. In addition, neural end-to-end TTS trained with LibriTTS-R achieved speech naturalness on par with that of the ground-truth samples. The corpus is freely available for download from http: //www.openslr.org/141/.
引用
收藏
页码:5496 / 5500
页数:5
相关论文
共 50 条
  • [31] A Controllable Multi-Lingual Multi-Speaker Multi-Style Text-to-Speech Synthesis With Multivariate Information Minimization
    Cheon, Sung Jun
    Choi, Byoung Jin
    Kim, Minchan
    Lee, Hyeonseung
    Kim, Nam Soo
    IEEE SIGNAL PROCESSING LETTERS, 2022, 29 : 55 - 59
  • [32] Multi-speaker Multi-style Text-to-speech Synthesis with Single-speaker Single-style Training Data Scenarios
    Xie, Qicong
    Li, Tao
    Wang, Xinsheng
    Wang, Zhichao
    Xie, Lei
    Yu, Guoqiao
    Wan, Guanglu
    2022 13TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2022, : 66 - 70
  • [33] MnTTS2: An Open-Source Multi-speaker Mongolian Text-to-Speech Synthesis Dataset
    Liang, Kailin
    Liu, Bin
    Hu, Yifan
    Liu, Rui
    Bao, Feilong
    Gao, Guanglai
    MAN-MACHINE SPEECH COMMUNICATION, NCMMSC 2022, 2023, 1765 : 318 - 329
  • [34] Transfer Learning for Low-Resource, Multi-Lingual, and Zero-Shot Multi-Speaker Text-to-Speech
    Jeong, Myeonghun
    Kim, Minchan
    Choi, Byoung Jin
    Yoon, Jaesam
    Jang, Won
    Kim, Nam Soo
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 1519 - 1530
  • [35] CLeLfPC: a Large Open Multi-Speaker Corpus of French Cued Speech
    Bigi, Brigitte
    Zimmermann, Maryvonne
    Andre, Carine
    LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 987 - 994
  • [36] A Universal Multi-Speaker Multi-Style Text-to-Speech via Disentangled Representation Learning based on Renyi Divergence Minimization
    Paul, Dipjyoti
    Mukherjee, Sankar
    Pantazis, Yannis
    Stylianou, Yannis
    INTERSPEECH 2021, 2021, : 3625 - 3629
  • [37] Face2Speech: Towards Multi-Speaker Text-to-Speech Synthesis Using an Embedding Vector Predicted from a Face Image
    Goto, Shunsuke
    Onishi, Kotaro
    Saito, Yuki
    Tachibana, Kentaro
    Mori, Koichiro
    INTERSPEECH 2020, 2020, : 1321 - 1325
  • [38] SNAC: Speaker-Normalized Affine Coupling Layer in Flow-Based Architecture for Zero-Shot Multi-Speaker Text-to-Speech
    Choi, Byoung Jin
    Jeong, Myeonghun
    Lee, Joun Yeop
    Kim, Nam Soo
    IEEE SIGNAL PROCESSING LETTERS, 2022, 29 : 2502 - 2506
  • [39] J-MAC: Japanese multi-speaker audiobook corpus for speech synthesis
    Takamichi, Shinnosuke
    Nakata, Wataru
    Tanji, Naoko
    Saruwatari, Hiroshi
    Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, 2022, 2022-September : 2358 - 2362
  • [40] J-MAC: Japanese multi-speaker audiobook corpus for speech synthesis
    Takamichi, Shinnosuke
    Nakata, Wataru
    Tanji, Naoko
    Saruwatari, Hiroshi
    INTERSPEECH 2022, 2022, : 2358 - 2362