Feature Extraction for Spectral Continuity Measures in Concatenative Speech Synthesis

被引：0

作者：

Kirkpatrick, Barry ^{[1
]}

O'Brien, Darragh ^{[1
]}

Scaife, Ronan ^{[1
]}

机构：

[1] Dublin City Univ, Fac Engn & Comp, Dublin 9, Ireland

来源：

INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5 | 2006年

关键词：

speech synthesis; unit selection; join cost; wavelet transform; phase spectra;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The quality of concatenative speech synthesis depends on the cost function employed for unit selection. Effective cost functions for spectral continuity are difficult to define and standard measures often do not accurately reflect human perception of discontinuity across a concatenated join. In this study the performance of a number of standard distance measures are compared for the task of detecting audible discontinuities in concatenated speech. Feature sets derived from. the phase spectrum are also investigated. Feature extraction based on wavelet analysis is proposed to overcome some of the limitations of the standard measures tested. Receiver Operating Characteristic (ROC) curves are constructed for each measure from the results of a perceptual experiment and are used to rank the performance of each measure. Results indicate that phase spectra is comparable to magnitude spectra as a join cost for spectral continuity. Measures based on wavelet transform coefficients outperform all other measures tested.

引用

页码：1742 / 1745

页数：4

共 50 条

[1] New objective distance measures for spectral discontinuities in concatenative speech synthesis
Vepa, J
King, S
Taylor, P
[J]. PROCEEDINGS OF THE 2002 IEEE WORKSHOP ON SPEECH SYNTHESIS, 2002, : 223 - 226
[2] Spectral modification for concatenative speech synthesis
Wouters, J
Macon, MW
[J]. 2000 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, PROCEEDINGS, VOLS I-VI, 2000, : 941 - 944
[3] Control of spectral dynamics in concatenative speech synthesis
Wouters, J
Macon, MW
[J]. IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2001, 9 (01): : 30 - 38
[4] Spectral dynamics as a source of discontinuity in concatenative speech synthesis
Kirkpatrick, Barry
O'Brien, Darragh
Scaife, Ronan
Errity, Andrew
[J]. PROCEEDINGS OF THE 2007 15TH INTERNATIONAL CONFERENCE ON DIGITAL SIGNAL PROCESSING, 2007, : 615 - +
[5] Statistical prediction of spectral discontinuities of speech in concatenative synthesis
Pablo Trivino, Manuel
Alias, Francesc
[J]. PROCESAMIENTO DEL LENGUAJE NATURAL, 2008, (40): : 67 - 74
[6] SPEECH SEGMENT SELECTION FOR CONCATENATIVE SYNTHESIS BASED ON SPECTRAL DISTORTION MINIMIZATION
IWAHASHI, N
KAIKI, N
SAGISAKA, Y
[J]. IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES, 1993, E76A (11) : 1942 - 1948
[7] Integration of Spectral Feature Extraction and Modeling for HMM-Based Speech Synthesis
Nakamura, Kazuhiro
Hashimoto, Kei
Nankaku, Yoshihiko
Tokuda, Keiichi
[J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2014, E97D (06): : 1438 - 1448
[8] SET OF CONCATENATIVE UNITS FOR SPEECH SYNTHESIS
OLIVE, J
LIBERMAN, M
[J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1979, 65 : S130 - S130
[9] On the detection of discontinuities in concatenative speech synthesis
Pantazis, Yannis
Stylianou, Yannis
[J]. PROGRESS IN NONLINEAR SPEECH PROCESSING, 2007, 4391 : 89 - +
[10] Discriminative training for concatenative speech synthesis
Kim, NS
Park, SS
[J]. IEEE SIGNAL PROCESSING LETTERS, 2004, 11 (01) : 40 - 43

← 1 2 3 4 5 →