Feature Extraction for Spectral Continuity Measures in Concatenative Speech Synthesis

被引：0

作者：

Kirkpatrick, Barry ^{[1
]}

O'Brien, Darragh ^{[1
]}

Scaife, Ronan ^{[1
]}

机构：

[1] Dublin City Univ, Fac Engn & Comp, Dublin 9, Ireland

来源：

INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5 | 2006年

关键词：

speech synthesis; unit selection; join cost; wavelet transform; phase spectra;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The quality of concatenative speech synthesis depends on the cost function employed for unit selection. Effective cost functions for spectral continuity are difficult to define and standard measures often do not accurately reflect human perception of discontinuity across a concatenated join. In this study the performance of a number of standard distance measures are compared for the task of detecting audible discontinuities in concatenated speech. Feature sets derived from. the phase spectrum are also investigated. Feature extraction based on wavelet analysis is proposed to overcome some of the limitations of the standard measures tested. Receiver Operating Characteristic (ROC) curves are constructed for each measure from the results of a perceptual experiment and are used to rank the performance of each measure. Results indicate that phase spectra is comparable to magnitude spectra as a join cost for spectral continuity. Measures based on wavelet transform coefficients outperform all other measures tested.

引用

页码：1742 / 1745

页数：4

共 50 条

[31] Affective word ratings for concatenative text-to-speech synthesis
Tsiakoulis, Pirros
Raptis, Spiros
Karabetsos, Sotiris
Chalamandaris, Aimilios
[J]. 20TH PAN-HELLENIC CONFERENCE ON INFORMATICS (PCI 2016), 2016,
[32] Syllable-Based Concatenative Speech Synthesis for Marathi Language
Ghate, Pravin M.
Shirbahadurkar, Suresh D.
[J]. INFORMATION AND COMMUNICATION TECHNOLOGY FOR COMPETITIVE STRATEGIES, 2019, 40 : 615 - 624
[33] Feedback Loop for Prosody Prediction in Concatenative Speech Synthesis.
Latorre, Javier
Gracia, Sergio
Akamine, Masami
[J]. INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 2027 - 2030
[34] Diphone-based concatenative speech synthesis system for Mongolian
Davaatsagaan, Munkhtuya
Paliwal, Kuldip K.
[J]. IMECS 2008: INTERNATIONAL MULTICONFERENCE OF ENGINEERS AND COMPUTER SCIENTISTS, VOLS I AND II, 2008, : 276 - 279
[35] Triphone based unit selection for concatenative visual speech synthesis
Huang, FJ
Cosatto, E
Graf, HP
[J]. 2002 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-IV, PROCEEDINGS, 2002, : 2037 - 2040
[36] LSM-based unit pruning for concatenative speech synthesis
Bellegarda, Jerome R.
[J]. 2007 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL IV, PTS 1-3, 2007, : 521 - 524
[37] Joint prosody prediction and unit selection for concatenative speech synthesis
Bulyko, I
Ostendorf, M
[J]. 2001 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-VI, PROCEEDINGS: VOL I: SPEECH PROCESSING 1; VOL II: SPEECH PROCESSING 2 IND TECHNOL TRACK DESIGN & IMPLEMENTATION OF SIGNAL PROCESSING SYSTEMS NEURALNETWORKS FOR SIGNAL PROCESSING; VOL III: IMAGE & MULTIDIMENSIONAL SIGNAL PROCESSING MULTIMEDIA SIGNAL PROCESSING, 2001, : 781 - 784
[38] Speech unit selection based on target values driven by speech data in concatenative speech synthesis
Hirai, T
Tenpaku, S
Shikano, K
[J]. PROCEEDINGS OF THE 2002 IEEE WORKSHOP ON SPEECH SYNTHESIS, 2002, : 43 - 46
[39] Feature Extraction Based on DCT and MVDR Spectral Estimation for Robust Speech Recognition
Seyedin, Sanaz
Ahadi, Mohammad
[J]. ICSP: 2008 9TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING, VOLS 1-5, PROCEEDINGS, 2008, : 605 - 608
[40] Robust speech feature extraction based on dynamic minimum subband spectral subtraction
Ma, Xin
Zhou, Weidong
Ju, Fang
[J]. INTELLIGENT COMPUTING IN SIGNAL PROCESSING AND PATTERN RECOGNITION, 2006, 345 : 1056 - 1061

← 1 2 3 4 5 →