Hi-Fi Multi-Speaker English TTS Dataset

被引:14
|
作者
Bakhturina, Evelina [1 ]
Lavrukhin, Vitaly [1 ]
Ginsburg, Boris [1 ]
Zhang, Yang [1 ]
机构
[1] NVIDIA, Santa Clara, CA 95051 USA
来源
关键词
D O I
10.21437/Interspeech.2021-1599
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
This paper introduces a new multi-speaker English dataset for training text-to-speech models. The dataset is based on LibriVox audiobooks and Project Gutenberg texts, both in the public domain. The new dataset contains about 292 hours of speech from 10 speakers with at least 17 hours per speaker sampled at 44.1 kHz. To select speech samples with high quality, we considered audio recordings with a signal bandwidth of at least 13 kHz and a signal-to-noise ratio (SNR) of at least 32 dB. The dataset is publicly released at "http://www.openslr.org/109/".
引用
收藏
页码:2776 / 2780
页数:5
相关论文
共 50 条
  • [1] 'Hi-Fi'
    Baranczak, S
    [J]. AKZENTE-ZEITSCHRIFT FUR LITERATUR, 2000, 47 (05): : 456 - 456
  • [2] Can Speaker Augmentation Improve Multi-Speaker End-to-End TTS?
    Cooper, Erica
    Lai, Cheng-, I
    Yasuda, Yusuke
    Yamagishi, Junichi
    [J]. INTERSPEECH 2020, 2020, : 3979 - 3983
  • [3] MULTI-SPEAKER MODELING AND SPEAKER ADAPTATION FOR DNN-BASED TTS SYNTHESIS
    Fan, Yuchen
    Qian, Yao
    Soong, Frank K.
    He, Lei
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 4475 - 4479
  • [4] Enhancing Zero-Shot Multi-Speaker TTS with Negated Speaker Representations
    Jeon, Yejin
    Kim, Yunsu
    Lee, Gary Geunbae
    [J]. THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 16, 2024, : 18336 - 18344
  • [5] Comparative Study for Multi-Speaker Mongolian TTS with a New Corpus
    Liang, Kailin
    Liu, Bin
    Hu, Yifan
    Liu, Rui
    Bao, Feilong
    Gao, Guanglai
    [J]. APPLIED SCIENCES-BASEL, 2023, 13 (07):
  • [6] ForumSum: A Multi-Speaker Conversation Summarization Dataset
    Khalman, Misha
    Zhao, Yao
    Saleh, Mohammad
    [J]. FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2021, 2021, : 4592 - 4599
  • [7] AISHELL-3: A Multi-Speaker Mandarin TTS Corpus
    Shi, Yao
    Bu, Hui
    Xu, Xin
    Zhang, Shaoji
    Li, Ming
    [J]. INTERSPEECH 2021, 2021, : 2756 - 2760
  • [8] PC is hi-fi
    不详
    [J]. ELECTRONICS WORLD, 2003, 109 (1801): : 6 - 6
  • [9] LANGUAGE OF HI-FI
    WILLIAMSON, R
    [J]. WIRELESS WORLD, 1977, 83 (1502): : 60 - 60
  • [10] HI-FI OUTDOORS
    不详
    [J]. AUDIO, 1971, 55 (06): : 20 - &