Using Cyclic Noise as the Source Signal for Neural Source-Filter-based Speech Waveform Model

被引:5
|
作者
Wang, Xin [1 ]
Yamagishi, Junichi [1 ,2 ]
机构
[1] Natl Inst Informat, Tokyo, Japan
[2] Univ Edinburgh, CSTR, Edinburgh, Midlothian, Scotland
来源
关键词
speech synthesis; source-filter model; harmonic-plus-noise waveform model; neural network; GENERATION; NETWORKS;
D O I
10.21437/Interspeech.2020-1018
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
Neural source-filter (NSF) waveform models generate speech waveforms by morphing sine-based source signals through dilated convolution in the time domain. Although the sine-based source signals help the NSF models to produce voiced sounds with specified pitch, the sine shape may constrain the generated waveform when the target voiced sounds are less periodic. In this paper, we propose a more flexible source signal called cyclic noise, a quasi-periodic noise sequence given by the convolution of a pulse train and a static random noise with a trainable decaying rate that controls the signal shape. We further propose a masked spectral loss to guide the NSF models to produce periodic voiced sounds from the cyclic noise-based source signal. Results from a large-scale listening test demonstrated the effectiveness of the cyclic noise and the masked spectral loss on speaker-independent NSF models in copy-synthesis experiments on the CMU ARCTIC database.
引用
收藏
页码:1992 / 1996
页数:5
相关论文
共 50 条
  • [1] NEURAL SOURCE-FILTER-BASED WAVEFORM MODEL FOR STATISTICAL PARAMETRIC SPEECH SYNTHESIS
    Wang, Xin
    Takaki, Shinji
    Yamagishi, Junichi
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 5916 - 5920
  • [2] Reverberation Modeling for Source-Filter-based Neural Vocoder
    Ai, Yang
    Wang, Xin
    Yamagishi, Junichi
    Ling, Zhen-Hua
    [J]. INTERSPEECH 2020, 2020, : 3560 - 3564
  • [3] Source-Filter-Based Single-Channel Speech Separation Using Pitch Information
    Stark, Michael
    Wohlmayr, Michael
    Pernkopf, Franz
    [J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2011, 19 (02): : 242 - 255
  • [4] Neural Source-Filter Waveform Models for Statistical Parametric Speech Synthesis
    Wang, Xin
    Takaki, Shinji
    Yamagishi, Junichi
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2020, 28 : 402 - 415
  • [5] SFNet: A Computationally Efficient Source Filter Model Based Neural Speech Synthesis
    Rao, Achuth M., V
    Ghosh, Prasanta Kumar
    [J]. IEEE SIGNAL PROCESSING LETTERS, 2020, 27 : 1170 - 1174
  • [6] Compositional model for speech denoising based on source/filter speech representation and smoothness/sparseness noise constraints
    Cabanas-Molero, P.
    Martinez-Munoz, D.
    Vera-Candeas, P.
    Canadas-Quesada, F. J.
    Ruiz-Reyes, N.
    [J]. SPEECH COMMUNICATION, 2016, 78 : 84 - 99
  • [7] Pitch modification of speech signal using source filter model by Linear Prediction for prosodic transformations
    Faycal, Ykhlef
    Guertei, Mhania
    Bensebti, Mesaoud
    [J]. PROCEEDINGS OF FUTURE GENERATION COMMUNICATION AND NETWORKING, MAIN CONFERENCE PAPERS, VOL 1, 2007, : 413 - 418
  • [8] ISOLATED WORD SPEECH RECOGNITION USING A NEURAL NETWORK BASED SOURCE MODEL
    LEE, GE
    TATTERSALL, GD
    SMYTH, SG
    [J]. BT TECHNOLOGY JOURNAL, 1992, 10 (03): : 38 - 47
  • [9] Source-filter Separation of Speech Signal in the Phase Domain
    Loweimi, Erfan
    Barker, Jon
    Hain, Thomas
    [J]. 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 598 - 602
  • [10] Kalman filter-based microphone array signal processing using the equivalent source model
    Bai, Mingsian R.
    Chen, Ching-Cheng
    [J]. JOURNAL OF SOUND AND VIBRATION, 2012, 331 (22) : 4940 - 4955