A comparison of spectral smoothing methods for segment concatenation based speech synthesis

被引:23
|
作者
Chappell, DT
Hansen, JHL
机构
[1] Univ Colorado, CSLR, RSPL, Boulder, CO 80309 USA
[2] Duke Univ, Dept Elect Engn, Durham, NC 27708 USA
关键词
speech synthesis; speech coding; spectral smoothing; spectral interpolation;
D O I
10.1016/S0167-6393(01)00008-5
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
There are many scenarios in both speech synthesis and coding in which adjacent time-frames of speech are spectrally discontinuous. This paper addresses the topic of improving concatenative speech synthesis with a limited database by proposing methods to smooth, adjust, or interpolate the spectral transitions between speech segments. The objective is to produce natural-sounding speech via segment concatenation when formants and other spectral features do not align properly. We consider several methods for adjusting the spectra at the boundaries between waveform segments. Techniques examined include optimal coupling, waveform interpolation (WI), linear predictive parameter interpolation, and psychoacoustic closure. Several of these algorithms have been previously developed for either coding or synthesis., while others are enhanced. We also consider the connection between speech science and articulation in determining the type of smoothing appropriate for given phoneme-phoneme transitions. Moreover, this work incorporates the use of a recently-proposed auditory-neural based distance measure (ANBM), which employs a computational model of the auditory system to assess perceived spectral discontinuities. We demonstrate how actual ANBM scores can be used to help determine the need for smoothing. In addition, formal evaluation of four smoothing methods, using the ANBM and extensive listener tests, reveals that smoothing can distinctly improve the quality of speech but when applied inappropriately can also degrade the quality. It is shown that after proper spectral smoothing, or spectral interpolation, the final synthesized speech sounds more natural and has a more continuous spectral structure. (C) 2002 Elsevier Science B.V. All rights reserved.
引用
收藏
页码:343 / 374
页数:32
相关论文
共 50 条
  • [1] An embedded English synthesis approach based on speech concatenation and smoothing
    Chen, GL
    Yue, DJ
    Zu, YQ
    Yu, ZL
    [J]. 2004 International Symposium on Chinese Spoken Language Processing, Proceedings, 2004, : 157 - 160
  • [2] Speech sentence compression based on speech segment extraction and concatenation
    Wu, Chung-Hsien
    Hsieh, Chia-Hsin
    Huang, Chien-Lin
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2007, 9 (02) : 434 - 438
  • [3] Speech Processing for Arabic Speech Synthesis Based on Concatenation Rules
    Imedjdouben F.
    [J]. SN Computer Science, 5 (3)
  • [4] HIGH-QUALITY SPEECH SYNTHESIS SYSTEM BASED ON WAVE-FORM CONCATENATION OF PHONEME SEGMENT
    HIROKAWA, T
    ITOH, K
    SATO, H
    [J]. IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES, 1993, E76A (11) : 1964 - 1970
  • [5] SPEECH SEGMENT SELECTION FOR CONCATENATIVE SYNTHESIS BASED ON SPECTRAL DISTORTION MINIMIZATION
    IWAHASHI, N
    KAIKI, N
    SAGISAKA, Y
    [J]. IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES, 1993, E76A (11) : 1942 - 1948
  • [6] A Wavelet Based Concatenation Algorithm for Gujarati Speech Synthesis
    Gujarathi, Priyanka Vishwas
    Patil, Sandip Raosaheb
    [J]. HELIX, 2020, 10 (05): : 38 - 43
  • [7] A new spectral smoothing algorithm for unit concatenating speech synthesis
    Kim, SJ
    Jang, KA
    Han, HB
    Hahn, M
    [J]. AI 2005: ADVANCES IN ARTIFICIAL INTELLIGENCE, 2005, 3809 : 550 - 556
  • [8] SPECTRAL SMOOTHING TECHNIQUE IN PARCOR SPEECH ANALYSIS-SYNTHESIS
    TOHKURA, Y
    ITAKURA, F
    HASHIMOTO, S
    [J]. IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1978, 26 (06): : 587 - 596
  • [9] Realistic Visual Speech Synthesis Based on Hybrid Concatenation Method
    Tao, Jianhua
    Xin, Le
    Yin, Panrong
    [J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2009, 17 (03): : 469 - 477
  • [10] AlpSynth - Concatenation-based speech synthesis for the Slovenian language
    Gros, JZ
    Mihelic, A
    Pavesic, N
    Zganec, M
    Gruden, S
    [J]. Proceedings ELMAR-2005, 2005, : 213 - 216