A comparison of spectral smoothing methods for segment concatenation based speech synthesis

被引:23
|
作者
Chappell, DT
Hansen, JHL
机构
[1] Univ Colorado, CSLR, RSPL, Boulder, CO 80309 USA
[2] Duke Univ, Dept Elect Engn, Durham, NC 27708 USA
关键词
speech synthesis; speech coding; spectral smoothing; spectral interpolation;
D O I
10.1016/S0167-6393(01)00008-5
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
There are many scenarios in both speech synthesis and coding in which adjacent time-frames of speech are spectrally discontinuous. This paper addresses the topic of improving concatenative speech synthesis with a limited database by proposing methods to smooth, adjust, or interpolate the spectral transitions between speech segments. The objective is to produce natural-sounding speech via segment concatenation when formants and other spectral features do not align properly. We consider several methods for adjusting the spectra at the boundaries between waveform segments. Techniques examined include optimal coupling, waveform interpolation (WI), linear predictive parameter interpolation, and psychoacoustic closure. Several of these algorithms have been previously developed for either coding or synthesis., while others are enhanced. We also consider the connection between speech science and articulation in determining the type of smoothing appropriate for given phoneme-phoneme transitions. Moreover, this work incorporates the use of a recently-proposed auditory-neural based distance measure (ANBM), which employs a computational model of the auditory system to assess perceived spectral discontinuities. We demonstrate how actual ANBM scores can be used to help determine the need for smoothing. In addition, formal evaluation of four smoothing methods, using the ANBM and extensive listener tests, reveals that smoothing can distinctly improve the quality of speech but when applied inappropriately can also degrade the quality. It is shown that after proper spectral smoothing, or spectral interpolation, the final synthesized speech sounds more natural and has a more continuous spectral structure. (C) 2002 Elsevier Science B.V. All rights reserved.
引用
收藏
页码:343 / 374
页数:32
相关论文
共 50 条
  • [21] A Close Look into the Probablistic Concatenation Model for Corpus-based Speech Synthesis
    Sakai, Shinsuke
    Maia, Ranniery
    Kawai, Hisashi
    Nakamura, Satoshi
    [J]. INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 744 - 747
  • [22] WaveNet-Based Speech Synthesis Applied to Czech A Comparison with the Traditional Synthesis Methods
    Hanzlicek, Zdenek
    Vit, Jakub
    Tihelka, Daniel
    [J]. TEXT, SPEECH, AND DIALOGUE (TSD 2018), 2018, 11107 : 445 - 452
  • [23] SPEECH SYNTHESIS BY CONCATENATION OF FORMANT-CODED WORDS
    RABINER, LR
    SCHAFER, RW
    FLANAGAN, JL
    [J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1971, 49 (01): : 124 - &
  • [24] Arabic speech synthesis using a concatenation of polyphones: The results
    Saidane, T
    Zrigui, M
    Ben Ahmed, M
    [J]. ADVANCES IN ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2005, 3501 : 406 - 411
  • [25] A waveform concatenation technique for text-to-speech synthesis
    Panda S.P.
    Nayak A.K.
    [J]. International Journal of Speech Technology, 2017, 20 (4) : 959 - 976
  • [26] Subjective evaluation of join cost and smoothing methods for unit selection speech synthesis
    Vepa, Jithendra
    King, Simon
    [J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2006, 14 (05): : 1763 - 1771
  • [27] Speech synthesis of the venezuelan dialect via diphone concatenation
    Rodriguez, M.
    Mora, E.
    Cave, C.
    [J]. CIENCIA E INGENIERIA, 2006, 27 (01): : 17 - 24
  • [28] SEGMENT INVENTORIES FOR SPEECH SYNTHESIS
    SIVERTSEN, E
    [J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1960, 32 (07): : 919 - 919
  • [29] SEGMENT INVENTORY FOR SPEECH SYNTHESIS
    WANG, WSY
    PETERSON, GE
    [J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1958, 30 (08): : 743 - 746
  • [30] SEGMENT INVENTORIES FOR SPEECH SYNTHESIS
    SIVERTSEN, E
    [J]. LANGUAGE AND SPEECH, 1961, 4 (01) : 27 - 89