AN INVESTIGATION OF MULTI-SPEAKER TRAINING FORWAVENET VOCODER

被引:0
|
作者
Hayashi, Tomoki [1 ]
Tamamori, Akira [2 ]
Kobayashi, Kazuhiro [3 ]
Takeda, Kazuya [1 ]
Toda, Tomoki [3 ]
机构
[1] Nagoya Univ, Grad Sch Informat Sci, Nagoya, Aichi, Japan
[2] Nagoya Univ, Inst Innovat Future Soc, Nagoya, Aichi, Japan
[3] Nagoya Univ, Informat Technol Ctr, Nagoya, Aichi, Japan
关键词
Speech synthesis; Vocoder; WaveNet; Convolutional neural network; SPEECH SYNTHESIS SYSTEM; REPRESENTATION; SPECTRUM; SIGNALS; F0;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we investigate the effectiveness of multi-speaker training for WaveNet vocoder. In our previous work, we have demonstrated that our proposed speaker-dependent (SD) WaveNet vocoder, which is trained with a single speaker's speech data, is capable of modeling temporal waveform structure, such as phase information, and makes it possible to generate more naturally sounding synthetic voices compared to conventional high-quality vocoder, STRAIGHT. However, it is still difficult to generate synthetic voices of various speakers using the SD-WaveNet due to its speaker-dependent property. Towards the development of speaker-independent WaveNet vocoder, we apply multi-speaker training techniques to the WaveNet vocoder and investigate its effectiveness. The experimental results demonstrate that 1) the multi-speaker WaveNet vocoder still outperforms STRAIGHT in generating known speakers' voices but it is comparable to STRAIGHT in generating unknown speakers' voices, and 2) the multi-speaker training is effective for developing the WaveNet vocoder capable of speech modification.
引用
收藏
页码:712 / 718
页数:7
相关论文
共 50 条
  • [1] Emotional Speech Synthesis for Multi-Speaker Emotional Dataset Using WaveNet Vocoder
    Choi, Heejin
    Park, Sangjun
    Park, Jinuk
    Hahn, Minsoo
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON CONSUMER ELECTRONICS (ICCE), 2019,
  • [2] INVESTIGATION OF FAST AND EFFICIENT METHODS FOR MULTI-SPEAKER MODELING AND SPEAKER ADAPTATION
    Zheng, Yibin
    Li, Xinhui
    Lu, Li
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6618 - 6622
  • [3] Training Speaker Embedding Extractors Using Multi-Speaker Audio with Unknown Speaker Boundaries
    Stafylakis, Themos
    Mosner, Ladislav
    Plchot, Oldrich
    Rohdin, Johan
    Silnova, Anna
    Burget, Lukas
    Cernocky, Jan Honza
    [J]. INTERSPEECH 2022, 2022, : 605 - 609
  • [4] Improving Multi-Speaker Tacotron with Speaker Gating Mechanisms
    Zhao, Wei
    Xu, Li
    He, Ting
    [J]. PROCEEDINGS OF THE 39TH CHINESE CONTROL CONFERENCE, 2020, : 7498 - 7503
  • [5] Multi-array multi-speaker tracking
    Potamitis, I
    Tremoulis, G
    Fakotakis, N
    [J]. TEXT, SPEECH AND DIALOGUE, PROCEEDINGS, 2003, 2807 : 206 - 213
  • [6] Wasserstein GAN and Waveform Loss-Based Acoustic Model Training for Multi-Speaker Text-to-Speech Synthecis Systems Using a WaveNet Vocoder
    Zhao, Yi
    Takaki, Shinji
    Luong, Hieu-Thi
    Yamagishi, Junichi
    Saito, Daisuke
    Minematsu, Nobuaki
    [J]. IEEE ACCESS, 2018, 6 : 60478 - 60488
  • [7] A hybrid approach to speaker recognition in multi-speaker environment
    Trivedi, J
    Maitra, A
    Mitra, SK
    [J]. PATTERN RECOGNITION AND MACHINE INTELLIGENCE, PROCEEDINGS, 2005, 3776 : 272 - 275
  • [8] Automatic speaker clustering from multi-speaker utterances
    McLaughlin, J
    Reynolds, D
    Singer, E
    O'Leary, GC
    [J]. ICASSP '99: 1999 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, PROCEEDINGS VOLS I-VI, 1999, : 817 - 820
  • [9] GANSpeech: Adversarial Training for High-Fidelity Multi-Speaker Speech Synthesis
    Yang, Jinhyeok
    Bae, Jae-Sung
    Bak, Taejun
    Kim, Young-Ik
    Cho, Hoon-Young
    [J]. INTERSPEECH 2021, 2021, : 2202 - 2206
  • [10] Speaker Clustering with Penalty Distance for Speaker Verification with Multi-Speaker Speech
    Das, Rohan Kumar
    Yang, Jichen
    Li, Haizhou
    [J]. 2019 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2019, : 1630 - 1635