DeepMine-multi-TTS: a Persian speech corpus for multi-speaker text-to-speech

被引:0
|
作者
Adibian, Majid [1 ,2 ]
Zeinali, Hossein [1 ,2 ]
Barmaki, Soroush [1 ]
机构
[1] Amirkabir Univ Technol, Dept Comp Engn, Tehran, Iran
[2] Sharif DeepMine Ltd, Tehran, Iran
关键词
Text-to-speech (TTS); Speech synthesis; Multi-speaker TTS dataset; Persian speech corpus; LIBRISPEECH;
D O I
10.1007/s10579-025-09807-6
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Speech synthesis has made significant progress in recent years thanks to deep neural networks (DNNs). However, one of the challenges of DNN-based models is the requirement for large and diverse data, which limits their applicability to many languages and domains. To date, no multi-speaker text-to-speech (TTS) dataset has been available in Persian, which hinders the development of these models for this language. In this paper, we present a novel dataset for multi-speaker TTS in Persian, which consists of 120 h of high-quality speech from 67 speakers. We use this dataset to train two synthesizers and a vocoder and evaluate the quality of the synthesized speech. The results show that the naturalness of the generated samples, measured by the mean opinion score (MOS) criterion, is 3.94 and 4.12 for two trained multi-speaker synthesizers, which indicates that the dataset is suitable for training multi-speaker TTS models and can facilitate future research in this area for Persian.
引用
收藏
页数:20
相关论文
共 50 条
  • [31] Design of a Yoruba Language Speech Corpus for the Purposes of Text-to-Speech (TTS) Synthesis
    Dagba, Theophile K.
    Aoga, John O. R.
    Fanou, Codjo C.
    INTELLIGENT INFORMATION AND DATABASE SYSTEMS, ACIIDS 2016, PT I, 2016, 9621 : 161 - 169
  • [32] Multi-speaker Multi-style Text-to-speech Synthesis with Single-speaker Single-style Training Data Scenarios
    Xie, Qicong
    Li, Tao
    Wang, Xinsheng
    Wang, Zhichao
    Xie, Lei
    Yu, Guoqiao
    Wan, Guanglu
    2022 13TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2022, : 66 - 70
  • [33] Transfer Learning for Low-Resource, Multi-Lingual, and Zero-Shot Multi-Speaker Text-to-Speech
    Jeong, Myeonghun
    Kim, Minchan
    Choi, Byoung Jin
    Yoon, Jaesam
    Jang, Won
    Kim, Nam Soo
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 1519 - 1530
  • [34] CLeLfPC: a Large Open Multi-Speaker Corpus of French Cued Speech
    Bigi, Brigitte
    Zimmermann, Maryvonne
    Andre, Carine
    LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 987 - 994
  • [35] SC-CNN: Effective Speaker Conditioning Method for Zero-Shot Multi-Speaker Text-to-Speech Systems
    Yoon, Hyungchan
    Kim, Changhwan
    Um, Seyun
    Yoon, Hyun-Wook
    Kang, Hong-Goo
    IEEE SIGNAL PROCESSING LETTERS, 2023, 30 : 593 - 597
  • [36] MnTTS2: An Open-Source Multi-speaker Mongolian Text-to-Speech Synthesis Dataset
    Liang, Kailin
    Liu, Bin
    Hu, Yifan
    Liu, Rui
    Bao, Feilong
    Gao, Guanglai
    MAN-MACHINE SPEECH COMMUNICATION, NCMMSC 2022, 2023, 1765 : 318 - 329
  • [37] Comparative Study for Multi-Speaker Mongolian TTS with a New Corpus
    Liang, Kailin
    Liu, Bin
    Hu, Yifan
    Liu, Rui
    Bao, Feilong
    Gao, Guanglai
    APPLIED SCIENCES-BASEL, 2023, 13 (07):
  • [38] A Universal Multi-Speaker Multi-Style Text-to-Speech via Disentangled Representation Learning based on Renyi Divergence Minimization
    Paul, Dipjyoti
    Mukherjee, Sankar
    Pantazis, Yannis
    Stylianou, Yannis
    INTERSPEECH 2021, 2021, : 3625 - 3629
  • [39] AISHELL-3: A Multi-Speaker Mandarin TTS Corpus
    Shi, Yao
    Bu, Hui
    Xu, Xin
    Zhang, Shaoji
    Li, Ming
    INTERSPEECH 2021, 2021, : 2756 - 2760
  • [40] Face2Speech: Towards Multi-Speaker Text-to-Speech Synthesis Using an Embedding Vector Predicted from a Face Image
    Goto, Shunsuke
    Onishi, Kotaro
    Saito, Yuki
    Tachibana, Kentaro
    Mori, Koichiro
    INTERSPEECH 2020, 2020, : 1321 - 1325