Using Cyclic Noise as the Source Signal for Neural Source-Filter-based Speech Waveform Model

被引:5
|
作者
Wang, Xin [1 ]
Yamagishi, Junichi [1 ,2 ]
机构
[1] Natl Inst Informat, Tokyo, Japan
[2] Univ Edinburgh, CSTR, Edinburgh, Midlothian, Scotland
来源
关键词
speech synthesis; source-filter model; harmonic-plus-noise waveform model; neural network; GENERATION; NETWORKS;
D O I
10.21437/Interspeech.2020-1018
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
Neural source-filter (NSF) waveform models generate speech waveforms by morphing sine-based source signals through dilated convolution in the time domain. Although the sine-based source signals help the NSF models to produce voiced sounds with specified pitch, the sine shape may constrain the generated waveform when the target voiced sounds are less periodic. In this paper, we propose a more flexible source signal called cyclic noise, a quasi-periodic noise sequence given by the convolution of a pulse train and a static random noise with a trainable decaying rate that controls the signal shape. We further propose a masked spectral loss to guide the NSF models to produce periodic voiced sounds from the cyclic noise-based source signal. Results from a large-scale listening test demonstrated the effectiveness of the cyclic noise and the masked spectral loss on speaker-independent NSF models in copy-synthesis experiments on the CMU ARCTIC database.
引用
收藏
页码:1992 / 1996
页数:5
相关论文
共 50 条
  • [31] A post-filter perceptual convolutive blind source separation approach for speech signal
    Pan, QF
    Aboulnasr, T
    [J]. 2004 7TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING PROCEEDINGS, VOLS 1-3, 2004, : 327 - 330
  • [32] Research on Speech Under Stress Based on Glottal Source Using a Physical Speech Production Model
    Yao, Xiao
    Xu, Ning
    Liu, Xiaofeng
    Jiang, Aimin
    Zhang, Xuewu
    [J]. IEEE ACCESS, 2018, 6 : 44473 - 44482
  • [33] Turbocharger noise prediction using broadband noise source model
    Li, Hui-Bin
    Sun, Zhen-Lian
    Peng, Xin
    [J]. Journal of Beijing Institute of Technology (English Edition), 2010, 19 (03): : 312 - 317
  • [34] Turbocharger Noise Prediction Using Broadband Noise Source Model
    李惠彬
    孙振莲
    彭信
    [J]. Journal of Beijing Institute of Technology, 2010, 19 (03) : 312 - 317
  • [35] TOWARDS SOURCE-FILTER BASED SINGLE SENSOR SPEECH SEPARATION
    Stark, Michael
    Pernkopf, Franz
    [J]. 2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS, 2009, : 97 - 100
  • [36] Speech Source Tracking Based on Distributed Particle Filter in Reverberant Environments
    Wang, Ruifang
    Lan, Xiaoyu
    [J]. ADVANCED HYBRID INFORMATION PROCESSING, ADHIP 2019, PT II, 2019, 302 : 330 - 342
  • [37] FastPitchFormant: Source-filter based Decomposed Modeling for Speech Synthesis
    Bak, Taejun
    Bae, Jae-Sung
    Bae, Hanbin
    Kim, Young-Ik
    Cho, Hoon-Young
    [J]. INTERSPEECH 2021, 2021, : 116 - 120
  • [38] Signal-to-Noise Ratio of Microwave Photonic Filter With an Interferometric Structure Based on an Incoherent Broadband Optical Source
    Huang, Long
    Li, Ruoming
    Xiang, Peng
    Dai, Pan
    Wang, Wenxuan
    Li, Mi
    Shi, Yuechun
    Chen, Xiangfei
    [J]. IEEE JOURNAL OF SELECTED TOPICS IN QUANTUM ELECTRONICS, 2021, 27 (02)
  • [39] Response source to speech and noise as revealed by EEG-based tomograms
    Danesh, AA
    Gould, HJ
    Pandya, A
    [J]. KNOWLEDGE-BASED INTELLIGENT INFORMATION ENGINEERING SYSTEMS & ALLIED TECHNOLOGIES, PTS 1 AND 2, 2001, 69 : 67 - 74
  • [40] A new model of source dependent noise for robust array signal processing
    Georgiou, PG
    Kyriakakis, C
    [J]. SAM2002: IEEE SENSOR ARRAY AND MULTICHANNEL SIGNAL PROCESSING WORKSHOP PROCEEDINGS, 2002, : 199 - 203