Direct Modelling of Magnitude and Phase Spectra for Statistical Parametric Speech Synthesis

被引:11
|
作者
Espic, Felipe [1 ]
Valentini-Botinhao, Cassia [1 ]
King, Simon [1 ]
机构
[1] Univ Edinburgh, CSTR, Edinburgh, Midlothian, Scotland
关键词
speech synthesis; vocoding; speeech features; phase modelling; spectral representation;
D O I
10.21437/Interspeech.2017-1647
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We propose a simple new representation for the FFT spectrum tailored to statistical parametric speech synthesis. It consists of four feature streams that describe magnitude, phase and fundamental frequency using real numbers. The proposed feature extraction method does not attempt to decompose the speech structure (e.g., into source+filter or harmonics+noise). By avoiding the simplifications inherent in decomposition, we can dramatically reduce the "phasiness" and "buzziness" typical of most vocoders. The method uses simple and computationally cheap operations and can operate at a lower frame rate than the 200 frames-per-second typical in many systems. It avoids heuristics and methods requiring approximate or iterative solutions, including phase unwrapping. Two DNN-based acoustic models were built - from male and female speech data - using the Merlin toolkit. Subjective comparisons were made with a state-of-the-art baseline, using the STRAIGHT vocoder. In all variants tested, and for both male and female voices, the proposed method substantially outperformed the baseline. We provide source code to enable our complete system to be replicated.
引用
收藏
页码:1383 / 1387
页数:5
相关论文
共 50 条
  • [1] A Neural Vocoder With Hierarchical Generation of Amplitude and Phase Spectra for Statistical Parametric Speech Synthesis
    Ai, Yang
    Ling, Zhen-Hua
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2020, 28 (839-851) : 839 - 851
  • [2] Duration modelling and evaluation for Arabic statistical parametric speech synthesis
    Zangar, Imene
    Mnasri, Zied
    Colotte, Vincent
    Jouvet, Denis
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2021, 80 (06) : 8331 - 8353
  • [3] Duration modelling and evaluation for Arabic statistical parametric speech synthesis
    Imene Zangar
    Zied Mnasri
    Vincent Colotte
    Denis Jouvet
    [J]. Multimedia Tools and Applications, 2021, 80 : 8331 - 8353
  • [4] Acoustic Features Modelling for Statistical Parametric Speech Synthesis: A Review
    Adiga, Nagaraj
    Prasanna, S. R. M.
    [J]. IETE TECHNICAL REVIEW, 2019, 36 (02) : 130 - 149
  • [5] Statistical parametric speech synthesis
    Black, Alan W.
    Zen, Heiga
    Tokuda, Keiichi
    [J]. 2007 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL IV, PTS 1-3, 2007, : 1229 - +
  • [6] Statistical parametric speech synthesis
    Zen, Heiga
    Tokuda, Keiichi
    Black, Alan W.
    [J]. SPEECH COMMUNICATION, 2009, 51 (11) : 1039 - 1064
  • [7] Excitation modelling using epoch features for statistical parametric speech synthesis
    Reddy, M. Kiran
    Rao, K. Sreenivasa
    [J]. COMPUTER SPEECH AND LANGUAGE, 2020, 60
  • [8] COMPLEX CEPSTRUM AS PHASE INFORMATION IN STATISTICAL PARAMETRIC SPEECH SYNTHESIS
    Maia, Ranniery
    Akamine, Masami
    Gales, M. J. F.
    [J]. 2012 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2012, : 4581 - 4584
  • [9] Statistical Parametric Speech Synthesis: A Review
    Aroon, Athira
    Dhonde, S. B.
    [J]. PROCEEDINGS OF 2015 IEEE 9TH INTERNATIONAL CONFERENCE ON INTELLIGENT SYSTEMS AND CONTROL (ISCO), 2015,
  • [10] An introduction to statistical parametric speech synthesis
    Simon King
    [J]. Sadhana, 2011, 36 : 837 - 852