Multi-speaker Emotional Text-to-speech Synthesizer

被引:0
|
作者
Cho, Sungjae [1 ]
Lee, Soo-Young [2 ]
机构
[1] Korea Inst Sci & Technol, Seoul, South Korea
[2] Korea Adv Inst Sci & Technol, Daejeon, South Korea
来源
关键词
emotional speech synthesis; text-to-speech; machine learning; neural network; deep learning;
D O I
暂无
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
We present a methodology to train our multi-speaker emotional text-to-speech synthesizer that can express speech for 10 speakers' 7 different emotions. All silences from audio samples are removed prior to learning. This results in fast learning by our model. Curriculum learning is applied to train our model efficiently. Our model is first trained with a large single-speaker neutral dataset, and then trained with neutral speech from all speakers. Finally, our model is trained using datasets of emotional speech from all speakers. In each stage, training samples of each speaker-emotion pair have equal probability to appear in mini-batches. Through this procedure, our model can synthesize speech for all targeted speakers and emotions. Our synthesized audio sets are available on our web page.
引用
收藏
页码:2337 / 2338
页数:2
相关论文
共 50 条
  • [1] An emotional speech synthesis markup language processor for multi-speaker and emotional text-to-speech applications
    Ryu, Se-Hui
    Cho, Hee
    Lee, Ju-Hyun
    Hong, Ki-Hyung
    [J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF KOREA, 2021, 40 (05): : 523 - 529
  • [2] ECAPA-TDNN for Multi-speaker Text-to-speech Synthesis
    Xue, Jinlong
    Deng, Yayue
    Han, Yichen
    Li, Ya
    Sun, Jianqing
    Liang, Jiaen
    [J]. 2022 13TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2022, : 230 - 234
  • [3] Deep Voice 2: Multi-Speaker Neural Text-to-Speech
    Arik, Sercan O.
    Diamos, Gregory
    Gibiansky, Andrew
    Miller, John
    Peng, Kainan
    Ping, Wei
    Raiman, Jonathan
    Zhou, Yanqi
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 30 (NIPS 2017), 2017, 30
  • [4] Lightweight, Multi-Speaker, Multi-Lingual Indic Text-to-Speech
    Singh, Abhayjeet
    Nagireddi, Amala
    Jayakumar, Anjali
    Deekshitha, G.
    Bandekar, Jesuraja
    Roopa, R.
    Badiger, Sandhya
    Udupa, Sathvik
    Kumar, Saurabh
    Ghosh, Prasanta Kumar
    Murthy, Hema A.
    Zen, Heiga
    Kumar, Pranaw
    Kant, Kamal
    Bole, Amol
    Singh, Bira Chandra
    Tokuda, Keiichi
    Hasegawa-Johnson, Mark
    Olbrich, Philipp
    [J]. IEEE OPEN JOURNAL OF SIGNAL PROCESSING, 2024, 5 : 790 - 798
  • [5] Multi-speaker Text-to-speech Synthesis Using Deep Gaussian Processes
    Mitsui, Kentaro
    Koriyama, Tomoki
    Saruwatari, Hiroshi
    [J]. INTERSPEECH 2020, 2020, : 2032 - 2036
  • [6] LIGHTSPEECH: LIGHTWEIGHT NON-AUTOREGRESSIVE MULTI-SPEAKER TEXT-TO-SPEECH
    Li, Song
    Ouyang, Beibei
    Li, Lin
    Hong, Qingyang
    [J]. 2021 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP (SLT), 2021, : 499 - 506
  • [7] Meta-StyleSpeech : Multi-Speaker Adaptive Text-to-Speech Generation
    Min, Dongchan
    Lee, Dong Bok
    Yang, Eunho
    Hwang, Sung Ju
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
  • [8] Training Multi-Speaker Neural Text-to-Speech Systems using Speaker-Imbalanced Speech Corpora
    Luong, Hieu-Thi
    Wang, Xin
    Yamagishi, Junichi
    Nishizawa, Nobuyuki
    [J]. INTERSPEECH 2019, 2019, : 1303 - 1307
  • [9] INVESTIGATING ON INCORPORATING PRETRAINED AND LEARNABLE SPEAKER REPRESENTATIONS FOR MULTI-SPEAKER MULTI-STYLE TEXT-TO-SPEECH
    Chien, Chung-Ming
    Lin, Jheng-Hao
    Huang, Chien-yu
    Hsu, Po-chun
    Lee, Hung-yi
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 8588 - 8592
  • [10] Multi-Lingual Multi-Speaker Text-to-Speech Synthesis for Voice Cloning with Online Speaker Enrollment
    Liu, Zhaoyu
    Mak, Brian
    [J]. INTERSPEECH 2020, 2020, : 2932 - 2936