DeepMine-multi-TTS: a Persian speech corpus for multi-speaker text-to-speech

被引:0
|
作者
Adibian, Majid [1 ,2 ]
Zeinali, Hossein [1 ,2 ]
Barmaki, Soroush [1 ]
机构
[1] Amirkabir Univ Technol, Dept Comp Engn, Tehran, Iran
[2] Sharif DeepMine Ltd, Tehran, Iran
关键词
Text-to-speech (TTS); Speech synthesis; Multi-speaker TTS dataset; Persian speech corpus; LIBRISPEECH;
D O I
10.1007/s10579-025-09807-6
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Speech synthesis has made significant progress in recent years thanks to deep neural networks (DNNs). However, one of the challenges of DNN-based models is the requirement for large and diverse data, which limits their applicability to many languages and domains. To date, no multi-speaker text-to-speech (TTS) dataset has been available in Persian, which hinders the development of these models for this language. In this paper, we present a novel dataset for multi-speaker TTS in Persian, which consists of 120 h of high-quality speech from 67 speakers. We use this dataset to train two synthesizers and a vocoder and evaluate the quality of the synthesized speech. The results show that the naturalness of the generated samples, measured by the mean opinion score (MOS) criterion, is 3.94 and 4.12 for two trained multi-speaker synthesizers, which indicates that the dataset is suitable for training multi-speaker TTS models and can facilitate future research in this area for Persian.
引用
收藏
页数:20
相关论文
共 50 条
  • [21] Semi-supervised Learning for Multi-speaker Text-to-speech Synthesis Using Discrete Speech Representation
    Tu, Tao
    Chen, Yuan-Jui
    Liu, Alexander H.
    Lee, Hung-yi
    INTERSPEECH 2020, 2020, : 3191 - 3195
  • [22] Adapter-Based Extension of Multi-Speaker Text-To-Speech Model for New Speakers
    Hsieh, Cheng-Ping
    Ghosh, Subhankar
    Ginsburg, Boris
    INTERSPEECH 2023, 2023, : 3028 - 3032
  • [23] Adapitch: Adaption Multi-Speaker Text-to-Speech Conditioned on Pitch Disentangling with Untranscribed Data
    Zhang, Xulong
    Wang, Jianzong
    Cheng, Ning
    Xiao, Jing
    2022 18TH INTERNATIONAL CONFERENCE ON MOBILITY, SENSING AND NETWORKING, MSN, 2022, : 456 - 460
  • [24] ZERO-SHOT MULTI-SPEAKER TEXT-TO-SPEECH WITH STATE-OF-THE-ART NEURAL SPEAKER EMBEDDINGS
    Cooper, Erica
    Lai, Cheng-, I
    Yasuda, Yusuke
    Fang, Fuming
    Wang, Xin
    Chen, Nanxin
    Yamagishi, Junichi
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 6184 - 6188
  • [25] Pruning Self-Attention for Zero-Shot Multi-Speaker Text-to-Speech
    Yoon, Hyungchan
    Kim, Changhwan
    Song, Eunwoo
    Yoon, Hyun-Wook
    Kang, Hong-Goo
    INTERSPEECH 2023, 2023, : 4299 - 4303
  • [26] Adversarial Speaker-Consistency Learning Using Untranscribed Speech Data for Zero-Shot Multi-Speaker Text-to-Speech
    Choi, Byoung Jin
    Jeong, Myeonghun
    Kim, Minchan
    Mun, Sung Hwan
    Kim, Nam Soo
    PROCEEDINGS OF 2022 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2022, : 1708 - 1712
  • [27] A Controllable Multi-Lingual Multi-Speaker Multi-Style Text-to-Speech Synthesis With Multivariate Information Minimization
    Cheon, Sung Jun
    Choi, Byoung Jin
    Kim, Minchan
    Lee, Hyeonseung
    Kim, Nam Soo
    IEEE SIGNAL PROCESSING LETTERS, 2022, 29 : 55 - 59
  • [28] Effective Zero-Shot Multi-Speaker Text-to-Speech Technique Using Information Perturbation and a Speaker Encoder
    Bang, Chae-Woon
    Chun, Chanjun
    SENSORS, 2023, 23 (23)
  • [29] SC-GlowTTS: an Efficient Zero-Shot Multi-Speaker Text-To-Speech Model
    Casanova, Edresson
    Shulby, Christopher
    Golge, Eren
    Muller, Nicolas Michael
    de Oliveira, Frederico Santos
    Candido Junior, Arnaldo
    Soares, Anderson da Silva
    Aluisio, Sandra Maria
    Ponti, Moacir Antonelli
    INTERSPEECH 2021, 2021, : 3645 - 3649
  • [30] NNSPEECH: SPEAKER-GUIDED CONDITIONAL VARIATIONAL AUTOENCODER FOR ZERO-SHOT MULTI-SPEAKER TEXT-TO-SPEECH
    Zhao, Botao
    Zhang, Xulong
    Wang, Jianzong
    Cheng, Ning
    Xiao, Jing
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 4293 - 4297