On the use of voice descriptors for glottal source shape parameter estimation

被引:8
|
作者
Huber, Stefan [1 ]
Roebel, Axel [1 ]
机构
[1] IRCAM CNRS UPMC STMS, Sound Anal Synth Team, F-75004 Paris, France
来源
COMPUTER SPEECH AND LANGUAGE | 2014年 / 28卷 / 05期
关键词
Glottal source; Voice quality; R-d shape parameter; LF model; Viterbi smoothing; OPEN QUOTIENT; AIR-FLOW; MODEL; PARAMETRIZATION; PRESSURE; CONTACT; QUALITY; AREA;
D O I
10.1016/j.csl.2013.09.006
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper summarizes the results of our investigations into estimating the shape of the glottal excitation source from speech signals. We employ the Liljencrants-Fant (LF) model describing the glottal flow and its derivative. The one-dimensional glottal source shape parameter R-d describes the transition in voice quality from a tense to a breathy voice. The parameter R-d has been derived from a statistical regression of the R waveshape parameters which parameterize the LF model. First, we introduce a variant of our recently proposed adaptation and range extension of the R-d parameter regression. Secondly, we discuss in detail the aspects of estimating the glottal source shape parameter R-d using the phase minimization paradigm. Based on the analysis of a large number of speech signals we describe the major conditions that are likely to result in erroneous R-d estimates. Based on these findings we investigate into means to increase the robustness of the R-d parameter estimation. We use Viterbi smoothing to suppress unnatural jumps of the estimated R-d parameter contours within short time segments. Additionally, we propose to steer the Viterbi algorithm by exploiting the covariation of other voice descriptors to improve Viterbi smoothing. The novel Viterbi steering is based on a Gaussian Mixture Model (GMM) that represents the joint density of the voice descriptors and the Open Quotient (OQ) estimated from corresponding electroglottographic (EGG) signals. A conversion function derived from the mixture model predicts OQ from the voice descriptors. Converted to R-d it defines an additional prior probability to adapt the partial probabilities of the Viterbi algorithm accordingly. Finally, we evaluate the performances of the phase minimization based methods using both variants to adapt and extent the R-d regression on one synthetic test set as well as in combination with Viterbi smoothing and each variant of the novel Viterbi steering on one test set of natural speech. The experimental findings exhibit improvements for both Viterbi approaches. (C) 2013 Elsevier Ltd. All rights reserved.
引用
收藏
页码:1170 / 1194
页数:25
相关论文
共 50 条
  • [1] Glottal source shape parameter estimation using phase minimization variants
    Huber, Stefan
    Roebel, Axel
    Degottex, Gilles
    [J]. 13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 1642 - 1645
  • [2] Glottal Parameter Estimation by Wavelet Transform for Voice Biometry
    Gomez Vilda, Pedro
    Munoz Mulas, Cristina
    Mazaira Fernandez, Luis M.
    Rodellar Biarge, Victoria
    Martinez Olalla, Rafael
    Alvarez Marquina, Agustin
    [J]. 2011 IEEE INTERNATIONAL CARNAHAN CONFERENCE ON SECURITY TECHNOLOGY (ICCST), 2011,
  • [3] Glottal source estimation robustness - A comparison of sensitivity of voice source estimation techniques
    Drugman, Thomas
    Dubuisson, Thomas
    Moinet, Alexis
    D'Alessandro, Nicolas
    Dutoit, Thierry
    [J]. SIGMAP 2008: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING AND MULTIMEDIA APPLICATIONS, 2008, : 202 - 207
  • [4] Application of glottal flow descriptors for pathological voice diagnosis
    Gidaye, Girish
    Nirmal, Jagannath
    Ezzine, Kadria
    Shrivas, Avinash
    Frikha, Mondher
    [J]. INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2020, 23 (01) : 205 - 222
  • [5] Application of glottal flow descriptors for pathological voice diagnosis
    Girish Gidaye
    Jagannath Nirmal
    Kadria Ezzine
    Avinash Shrivas
    Mondher Frikha
    [J]. International Journal of Speech Technology, 2020, 23 : 205 - 222
  • [6] GLOTTAL SOURCE MODELING FOR VOICE CONVERSION
    CHILDERS, DG
    [J]. SPEECH COMMUNICATION, 1995, 16 (02) : 127 - 138
  • [7] Estimation method of glottal vocal efficiency based on conversion function of voice source
    ZOU Yuan WAN Mingxi ZHAO Shouguo WANG Supin(1 Department of Biomedical Engineering
    [J]. Chinese Journal of Acoustics, 2002, (04) : 332 - 342
  • [8] Glottal Source Information for Pathological Voice Detection
    Narendra, N. P.
    Alku, Paavo
    [J]. IEEE ACCESS, 2020, 8 : 67745 - 67755
  • [9] Physiological control of low-dimensional glottal models with applications to voice source parameter matching
    Avanzini, Federico
    Maratea, Simone
    Drioli, Carlo
    [J]. ACTA ACUSTICA UNITED WITH ACUSTICA, 2006, 92 (05) : 731 - 740
  • [10] Effects of lung volume on the glottal voice source
    Iwarsson, J
    Thomasson, M
    Sundberg, J
    [J]. JOURNAL OF VOICE, 1998, 12 (04) : 424 - 433