DeepMine-multi-TTS: a Persian speech corpus for multi-speaker text-to-speech

被引:0
|
作者
Adibian, Majid [1 ,2 ]
Zeinali, Hossein [1 ,2 ]
Barmaki, Soroush [1 ]
机构
[1] Amirkabir Univ Technol, Dept Comp Engn, Tehran, Iran
[2] Sharif DeepMine Ltd, Tehran, Iran
关键词
Text-to-speech (TTS); Speech synthesis; Multi-speaker TTS dataset; Persian speech corpus; LIBRISPEECH;
D O I
10.1007/s10579-025-09807-6
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Speech synthesis has made significant progress in recent years thanks to deep neural networks (DNNs). However, one of the challenges of DNN-based models is the requirement for large and diverse data, which limits their applicability to many languages and domains. To date, no multi-speaker text-to-speech (TTS) dataset has been available in Persian, which hinders the development of these models for this language. In this paper, we present a novel dataset for multi-speaker TTS in Persian, which consists of 120 h of high-quality speech from 67 speakers. We use this dataset to train two synthesizers and a vocoder and evaluate the quality of the synthesized speech. The results show that the naturalness of the generated samples, measured by the mean opinion score (MOS) criterion, is 3.94 and 4.12 for two trained multi-speaker synthesizers, which indicates that the dataset is suitable for training multi-speaker TTS models and can facilitate future research in this area for Persian.
引用
收藏
页数:20
相关论文
共 50 条
  • [1] LIGHT-TTS: LIGHTWEIGHT MULTI-SPEAKER MULTI-LINGUAL TEXT-TO-SPEECH
    Li, Song
    Ouyang, Beibei
    Li, Lin
    Hong, Qingyang
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 8383 - 8387
  • [2] LibriTTS-R: A Restored Multi-Speaker Text-to-Speech Corpus
    Koizumi, Yuma
    Zen, Heiga
    Karita, Shigeki
    Ding, Yifan
    Yatabe, Kohei
    Morioka, Nobuyuki
    Bacchiani, Michiel
    Zhang, Yu
    Han, Wei
    Bapna, Ankur
    INTERSPEECH 2023, 2023, : 5496 - 5500
  • [3] Multi-speaker Emotional Text-to-speech Synthesizer
    Cho, Sungjae
    Lee, Soo-Young
    INTERSPEECH 2021, 2021, : 2337 - 2338
  • [4] Multi-Speaker Text-to-Speech Training With Speaker Anonymized Data
    Huang, Wen-Chin
    Wu, Yi-Chiao
    Toda, Tomoki
    IEEE SIGNAL PROCESSING LETTERS, 2024, 31 : 2995 - 2999
  • [5] Lightweight, Multi-Speaker, Multi-Lingual Indic Text-to-Speech
    Singh, Abhayjeet
    Nagireddi, Amala
    Jayakumar, Anjali
    Deekshitha, G.
    Bandekar, Jesuraja
    Roopa, R.
    Badiger, Sandhya
    Udupa, Sathvik
    Kumar, Saurabh
    Ghosh, Prasanta Kumar
    Murthy, Hema A.
    Zen, Heiga
    Kumar, Pranaw
    Kant, Kamal
    Bole, Amol
    Singh, Bira Chandra
    Tokuda, Keiichi
    Hasegawa-Johnson, Mark
    Olbrich, Philipp
    IEEE OPEN JOURNAL OF SIGNAL PROCESSING, 2024, 5 : 790 - 798
  • [6] Deep Voice 2: Multi-Speaker Neural Text-to-Speech
    Arik, Sercan O.
    Diamos, Gregory
    Gibiansky, Andrew
    Miller, John
    Peng, Kainan
    Ping, Wei
    Raiman, Jonathan
    Zhou, Yanqi
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 30 (NIPS 2017), 2017, 30
  • [7] ECAPA-TDNN for Multi-speaker Text-to-speech Synthesis
    Xue, Jinlong
    Deng, Yayue
    Han, Yichen
    Li, Ya
    Sun, Jianqing
    Liang, Jiaen
    2022 13TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2022, : 230 - 234
  • [8] A MULTI PURPOSE AND LARGE SCALE SPEECH CORPUS IN PERSIAN AND ENGLISH FOR SPEAKER AND SPEECH RECOGNITION: THE DEEPMINE DATABASE
    Zeinali, Hossein
    Burget, Lukas
    Cernocky, Jan Honza
    2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019), 2019, : 397 - 402
  • [9] Training Multi-Speaker Neural Text-to-Speech Systems using Speaker-Imbalanced Speech Corpora
    Luong, Hieu-Thi
    Wang, Xin
    Yamagishi, Junichi
    Nishizawa, Nobuyuki
    INTERSPEECH 2019, 2019, : 1303 - 1307
  • [10] Multi-speaker Text-to-speech Synthesis Using Deep Gaussian Processes
    Mitsui, Kentaro
    Koriyama, Tomoki
    Saruwatari, Hiroshi
    INTERSPEECH 2020, 2020, : 2032 - 2036