Direct Modelling of Magnitude and Phase Spectra for Statistical Parametric Speech Synthesis

被引：11

作者：

Espic, Felipe ^{[1
]}

Valentini-Botinhao, Cassia ^{[1
]}

King, Simon ^{[1
]}

机构：

[1] Univ Edinburgh, CSTR, Edinburgh, Midlothian, Scotland

来源：

18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION | 2017年

关键词：

speech synthesis; vocoding; speeech features; phase modelling; spectral representation;

D O I：

10.21437/Interspeech.2017-1647

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We propose a simple new representation for the FFT spectrum tailored to statistical parametric speech synthesis. It consists of four feature streams that describe magnitude, phase and fundamental frequency using real numbers. The proposed feature extraction method does not attempt to decompose the speech structure (e.g., into source+filter or harmonics+noise). By avoiding the simplifications inherent in decomposition, we can dramatically reduce the "phasiness" and "buzziness" typical of most vocoders. The method uses simple and computationally cheap operations and can operate at a lower frame rate than the 200 frames-per-second typical in many systems. It avoids heuristics and methods requiring approximate or iterative solutions, including phase unwrapping. Two DNN-based acoustic models were built - from male and female speech data - using the Merlin toolkit. Subjective comparisons were made with a state-of-the-art baseline, using the STRAIGHT vocoder. In all variants tested, and for both male and female voices, the proposed method substantially outperformed the baseline. We provide source code to enable our complete system to be replicated.

引用

页码：1383 / 1387

页数：5

共 50 条

[1] A Neural Vocoder With Hierarchical Generation of Amplitude and Phase Spectra for Statistical Parametric Speech Synthesis
Ai, Yang
Ling, Zhen-Hua
[J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2020, 28 (839-851) : 839 - 851
[2] Duration modelling and evaluation for Arabic statistical parametric speech synthesis
Zangar, Imene
Mnasri, Zied
Colotte, Vincent
Jouvet, Denis
[J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2021, 80 (06) : 8331 - 8353
[3] Duration modelling and evaluation for Arabic statistical parametric speech synthesis
Imene Zangar
Zied Mnasri
Vincent Colotte
Denis Jouvet
[J]. Multimedia Tools and Applications, 2021, 80 : 8331 - 8353
[4] Acoustic Features Modelling for Statistical Parametric Speech Synthesis: A Review
Adiga, Nagaraj
Prasanna, S. R. M.
[J]. IETE TECHNICAL REVIEW, 2019, 36 (02) : 130 - 149
[5] Statistical parametric speech synthesis
Black, Alan W.
Zen, Heiga
Tokuda, Keiichi
[J]. 2007 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL IV, PTS 1-3, 2007, : 1229 - +
[6] Statistical parametric speech synthesis
Zen, Heiga
Tokuda, Keiichi
Black, Alan W.
[J]. SPEECH COMMUNICATION, 2009, 51 (11) : 1039 - 1064
[7] Excitation modelling using epoch features for statistical parametric speech synthesis
Reddy, M. Kiran
Rao, K. Sreenivasa
[J]. COMPUTER SPEECH AND LANGUAGE, 2020, 60
[8] COMPLEX CEPSTRUM AS PHASE INFORMATION IN STATISTICAL PARAMETRIC SPEECH SYNTHESIS
Maia, Ranniery
Akamine, Masami
Gales, M. J. F.
[J]. 2012 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2012, : 4581 - 4584
[9] Statistical Parametric Speech Synthesis: A Review
Aroon, Athira
Dhonde, S. B.
[J]. PROCEEDINGS OF 2015 IEEE 9TH INTERNATIONAL CONFERENCE ON INTELLIGENT SYSTEMS AND CONTROL (ISCO), 2015,
[10] An introduction to statistical parametric speech synthesis
Simon King
[J]. Sadhana, 2011, 36 : 837 - 852

← 1 2 3 4 5 →