Non-Parallel Voice Conversion System With WaveNet Vocoder and Collapsed Speech Suppression

被引:5
|
作者
Wu, Yi-Chiao [1 ]
Tobing, Patrick Lumban [1 ]
Kobayashi, Kazuhiro [2 ]
Hayashi, Tomoki [3 ]
Toda, Tomoki [2 ]
机构
[1] Nagoya Univ, Grad Sch Informat, Nagoya, Aichi 4648601, Japan
[2] Nagoya Univ, Informat Technol Ctr, Nagoya, Aichi 4648601, Japan
[3] Nagoya Univ, Grad Sch Informat Sci, Nagoya, Aichi 4648601, Japan
来源
IEEE ACCESS | 2020年 / 8卷
基金
日本学术振兴会; 日本科学技术振兴机构;
关键词
Non-parallel voice conversion; WaveNet vocoder; collapsed speech segment detection; linear predictive coding distribution constraint; NEURAL-NETWORKS; REPRESENTATIONS; GENERATION;
D O I
10.1109/ACCESS.2020.2984007
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper, we integrate a simple non-parallel voice conversion (VC) system with a WaveNet (WN) vocoder and a proposed collapsed speech suppression technique. The effectiveness of WN as a vocoder for generating high-fidelity speech waveforms on the basis of acoustic features has been confirmed in recent works. However, when combining the WN vocoder with a VC system, the distorted acoustic features, acoustic and temporal mismatches, and exposure bias usually lead to significant speech quality degradation, making WN generate some very noisy speech segments called collapsed speech. To tackle the problem, we take conventional-vocoder-generated speech as the reference speech to derive a linear predictive coding distribution constraint (LPCDC) to avoid the collapsed speech problem. Furthermore, to mitigate the negative effects introduced by the LPCDC, we propose a collapsed speech segment detector (CSSD) to ensure that the LPCDC is only applied to the problematic segments to limit the loss of quality to short periods. Objective and subjective evaluations are conducted, and the experimental results confirm the effectiveness of the proposed method, which further improves the speech quality of our previous non-parallel VC system submitted to Voice Conversion Challenge 2018.
引用
收藏
页码:62094 / 62106
页数:13
相关论文
共 50 条
  • [1] Collapsed speech segment detection and suppression for WaveNet vocoder
    Wu, Yi-Chiao
    Kobayashi, Kazuhiro
    Hayashi, Tomoki
    Tobing, Patrick Lumban
    Toda, Tomoki
    [J]. 19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 1988 - 1992
  • [2] Jointly Trained Conversion Model and WaveNet Vocoder for Non-parallel Voice Conversion using Mel-spectrograms and Phonetic Posteriorgrams
    Liu, Songxiang
    Cao, Yuewen
    Wu, Xixin
    Sun, Lifa
    Liu, Xunying
    Meng, Helen
    [J]. INTERSPEECH 2019, 2019, : 714 - 718
  • [3] A Speaker-Dependent WaveNet for Voice Conversion with Non-Parallel Data
    Tian, Xiaohai
    Chng, Eng Siong
    Li, Haizhou
    [J]. INTERSPEECH 2019, 2019, : 201 - 205
  • [4] Any-to-One Non-Parallel Voice Conversion System Using an Autoregressive Conversion Model and LPCNet Vocoder
    Ezzine, Kadria
    Di Martino, Joseph
    Frikha, Mondher
    [J]. APPLIED SCIENCES-BASEL, 2023, 13 (21):
  • [5] Parallel vs. Non-parallel Voice Conversion for Esophageal Speech
    Serrano, Luis
    Raman, Sneha
    Tavarez, David
    Navas, Eva
    Hernaez, Inma
    [J]. INTERSPEECH 2019, 2019, : 4549 - 4553
  • [6] WaveNet Vocoder with Limited Training Data for Voice Conversion
    Liu, Li-Juan
    Ling, Zhen-Hua
    Yuan-Jiang
    Ming-Zhou
    Dai, Li-Rong
    [J]. 19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 1983 - 1987
  • [7] SINGING VOICE CONVERSION WITH NON-PARALLEL DATA
    Chen, Xin
    Chu, Wei
    Guo, Jinxi
    Xu, Ning
    [J]. 2019 2ND IEEE CONFERENCE ON MULTIMEDIA INFORMATION PROCESSING AND RETRIEVAL (MIPR 2019), 2019, : 292 - 296
  • [8] Non-Parallel Voice Conversion for ASR Augmentation
    Wang, Gary
    Rosenberg, Andrew
    Ramabhadran, Bhuvana
    Biadsy, Fadi
    Huang, Yinghui
    Emond, Jesse
    Mengibar, Pedro Moreno
    [J]. INTERSPEECH 2022, 2022, : 3408 - 3412
  • [9] AN EVALUATION OF DEEP SPECTRAL MAPPINGS AND WAVENET VOCODER FOR VOICE CONVERSION
    Tobing, Patrick Lumban
    Hayashi, Tomoki
    Wu, Yi-Chiao
    Kobayashi, Kazuhiro
    Toda, Tomoki
    [J]. 2018 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2018), 2018, : 297 - 303
  • [10] Refined WaveNet Vocoder for Variational Autoencoder Based Voice Conversion
    Huang, Wen-Chin
    Wu, Yi-Chiao
    Hwang, Hsin-Te
    Tobing, Patrick Lumban
    Hayashi, Tomoki
    Kobayashi, Kazuhiro
    Toda, Tomoki
    Tsao, Yu
    Wang, Hsin-Min
    [J]. 2019 27TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2019,