Feature Extraction for Spectral Continuity Measures in Concatenative Speech Synthesis

被引:0
|
作者
Kirkpatrick, Barry [1 ]
O'Brien, Darragh [1 ]
Scaife, Ronan [1 ]
机构
[1] Dublin City Univ, Fac Engn & Comp, Dublin 9, Ireland
关键词
speech synthesis; unit selection; join cost; wavelet transform; phase spectra;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The quality of concatenative speech synthesis depends on the cost function employed for unit selection. Effective cost functions for spectral continuity are difficult to define and standard measures often do not accurately reflect human perception of discontinuity across a concatenated join. In this study the performance of a number of standard distance measures are compared for the task of detecting audible discontinuities in concatenated speech. Feature sets derived from. the phase spectrum are also investigated. Feature extraction based on wavelet analysis is proposed to overcome some of the limitations of the standard measures tested. Receiver Operating Characteristic (ROC) curves are constructed for each measure from the results of a perceptual experiment and are used to rank the performance of each measure. Results indicate that phase spectra is comparable to magnitude spectra as a join cost for spectral continuity. Measures based on wavelet transform coefficients outperform all other measures tested.
引用
收藏
页码:1742 / 1745
页数:4
相关论文
共 50 条
  • [41] Applying the harmonic plus noise model in concatenative speech synthesis
    Stylianou, Y
    [J]. IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2001, 9 (01): : 21 - 29
  • [42] A framework for a Bangla concatenative text-to-speech synthesis system
    Syed, MR
    Chakrobartty, S
    Bignall, RJ
    [J]. Innovations Through Information Technology, Vols 1 and 2, 2004, : 1318 - 1320
  • [43] Allophone-based concatenative speech synthesis system for Russian
    Skrelin, PA
    [J]. TEXT, SPEECH AND DIALOGUE, 1999, 1692 : 156 - 159
  • [44] Spectral Tensor Synthesis Analysis for Hyperspectral Image Spectral–Spatial Feature Extraction
    Ronghua Yan
    Jinye Peng
    Dongmei Ma
    Desheng Wen
    [J]. Journal of the Indian Society of Remote Sensing, 2019, 47 : 91 - 100
  • [45] Perceptual evaluation of cost for segment selection in concatenative speech synthesis
    Toda, T
    Kawai, H
    Tsuzaki, M
    Shikano, K
    [J]. PROCEEDINGS OF THE 2002 IEEE WORKSHOP ON SPEECH SYNTHESIS, 2002, : 183 - 186
  • [46] On the implementation of the Harmonic plus Noise Model for concatenative speech synthesis
    Stylianou, Y
    [J]. 2000 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, PROCEEDINGS, VOLS I-VI, 2000, : 957 - 960
  • [47] LSM-based boundary training for concatenative speech synthesis
    Bellegarda, Jerome R.
    [J]. 2006 IEEE International Conference on Acoustics, Speech and Signal Processing, Vols 1-13, 2006, : 721 - 724
  • [48] Assessment and correction of voice quality variabilities in large speech databases for concatenative speech synthesis
    Stylianou, Y
    [J]. ICASSP '99: 1999 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, PROCEEDINGS VOLS I-VI, 1999, : 377 - 380
  • [49] UnitNet: A Sequence-to-Sequence Acoustic Model for Concatenative Speech Synthesis
    Zhou, Xiao
    Ling, Zhen-Hua
    Dai, Li-Rong
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2021, 29 : 2643 - 2655
  • [50] An auditory-based distortion measure with application to concatenative speech synthesis
    Hansen, JHL
    Chappell, DT
    [J]. IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1998, 6 (05): : 489 - 495