AN INVESTIGATION OF MULTI-SPEAKER TRAINING FORWAVENET VOCODER

被引:0
|
作者
Hayashi, Tomoki [1 ]
Tamamori, Akira [2 ]
Kobayashi, Kazuhiro [3 ]
Takeda, Kazuya [1 ]
Toda, Tomoki [3 ]
机构
[1] Nagoya Univ, Grad Sch Informat Sci, Nagoya, Aichi, Japan
[2] Nagoya Univ, Inst Innovat Future Soc, Nagoya, Aichi, Japan
[3] Nagoya Univ, Informat Technol Ctr, Nagoya, Aichi, Japan
关键词
Speech synthesis; Vocoder; WaveNet; Convolutional neural network; SPEECH SYNTHESIS SYSTEM; REPRESENTATION; SPECTRUM; SIGNALS; F0;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we investigate the effectiveness of multi-speaker training for WaveNet vocoder. In our previous work, we have demonstrated that our proposed speaker-dependent (SD) WaveNet vocoder, which is trained with a single speaker's speech data, is capable of modeling temporal waveform structure, such as phase information, and makes it possible to generate more naturally sounding synthetic voices compared to conventional high-quality vocoder, STRAIGHT. However, it is still difficult to generate synthetic voices of various speakers using the SD-WaveNet due to its speaker-dependent property. Towards the development of speaker-independent WaveNet vocoder, we apply multi-speaker training techniques to the WaveNet vocoder and investigate its effectiveness. The experimental results demonstrate that 1) the multi-speaker WaveNet vocoder still outperforms STRAIGHT in generating known speakers' voices but it is comparable to STRAIGHT in generating unknown speakers' voices, and 2) the multi-speaker training is effective for developing the WaveNet vocoder capable of speech modification.
引用
收藏
页码:712 / 718
页数:7
相关论文
共 50 条
  • [41] MULTI-SPEAKER, NARROWBAND, CONTINUOUS MARATHI SPEECH DATABASE
    Godambe, Tejas
    Bondale, Nandini
    Samudravijaya, K.
    Rao, Preeti
    [J]. 2013 INTERNATIONAL CONFERENCE ORIENTAL COCOSDA HELD JOINTLY WITH 2013 CONFERENCE ON ASIAN SPOKEN LANGUAGE RESEARCH AND EVALUATION (O-COCOSDA/CASLRE), 2013,
  • [42] Can Speaker Augmentation Improve Multi-Speaker End-to-End TTS?
    Cooper, Erica
    Lai, Cheng-, I
    Yasuda, Yusuke
    Yamagishi, Junichi
    [J]. INTERSPEECH 2020, 2020, : 3979 - 3983
  • [43] Speaker detection using multi-speaker audio files for both enrollment and test
    Bonastre, JF
    Meignier, S
    Merlin, T
    [J]. 2003 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL II, PROCEEDINGS: SPEECH II; INDUSTRY TECHNOLOGY TRACKS; DESIGN & IMPLEMENTATION OF SIGNAL PROCESSING SYSTEMS; NEURAL NETWORKS FOR SIGNAL PROCESSING, 2003, : 77 - 80
  • [44] STREAMING MULTI-SPEAKER ASR WITH RNN-T
    Sklyar, Ilya
    Piunova, Anna
    Liu, Yulan
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6903 - 6907
  • [45] Speech Recognition and Multi-Speaker Diarization of Long Conversations
    Mao, Huanru Henry
    Li, Shuyang
    McAuley, Julian
    Cottrell, Garrison W.
    [J]. INTERSPEECH 2020, 2020, : 691 - 695
  • [46] Enhancing Zero-Shot Multi-Speaker TTS with Negated Speaker Representations
    Jeon, Yejin
    Kim, Yunsu
    Lee, Gary Geunbae
    [J]. THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 16, 2024, : 18336 - 18344
  • [47] Single-speaker/multi-speaker co-channel speech classification
    Rossignol, Stephane
    Pietquini, Olivier
    [J]. 11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4, 2010, : 2322 - 2325
  • [48] MULTI-SCENARIO DEEP LEARNING FOR MULTI-SPEAKER SOURCE SEPARATION
    Zegers, Jeroen
    Van Hamme, Hugo
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5379 - 5383
  • [49] THE MULTI-SPEAKER MULTI-STYLE VOICE CLONING CHALLENGE 2021
    Xie, Qicong
    Tian, Xiaohai
    Liu, Guanghou
    Song, Kun
    Xie, Lei
    Wu, Zhiyong
    Li, Hai
    Shi, Song
    Li, Haizhou
    Hong, Fen
    Bu, Hui
    Xu, Xin
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 8613 - 8617
  • [50] MULTI-SPEAKER TRACKING BY FUSING AUDIO AND VIDEO INFORMATION
    Xiong, Zichao
    Liu, Hongqing
    Zhou, Yi
    Luo, Zhen
    [J]. 2021 IEEE STATISTICAL SIGNAL PROCESSING WORKSHOP (SSP), 2021, : 321 - 325