Instantaneous Fundamental Frequency Estimation With Optimal Segmentation for Nonstationary Voiced Speech

被引:22
|
作者
Norholm, Sidsel Marie [1 ]
Jensen, Jesper Rindom [1 ]
Christensen, Mads Graesboll [1 ]
机构
[1] Aalborg Univ, Audio Anal Lab, Architecture Design & Media Technol, DK-9000 Aalborg, Denmark
关键词
Harmonic chirp model; parameter estimation; prewhitening; segmentation; PARAMETER-ESTIMATION; NOISE; ENHANCEMENT; TRACKING; SIGNAL;
D O I
10.1109/TASLP.2016.2608948
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In speech processing, the speech is often considered stationary within segments of 20-30 ms even though it is well known not to be true. In this paper, we take the nonstationarity of voiced speech into account by using a linear chirp model to describe the speech signal. We propose a maximum likelihood estimator of the fundamental frequency and chirp rate of this model, and show that it reaches the Cramer-Rao lower bound. Since the speech varies over time, a fixed segment length is not optimal, and we propose making a segmentation of the signal based on the maximum a posteriori criterion. Using this segmentation method, the segments are on average longer for the chirp model compared to the traditional harmonic model. For the signal under test, the average segment length is 24.4 and 17.1 ms for the chirp model and traditional harmonic model, respectively. This suggests a better fit of the chirp model than the harmonic model to the speech signal. The methods are based on an assumption of white Gaussian noise, and, therefore, two prewhitening filters are also proposed.
引用
收藏
页码:2354 / 2367
页数:14
相关论文
共 50 条
  • [21] Speech fundamental frequency estimation using the Alternate Comb
    Lienard, Jean-Sylvain
    Signol, Francois
    Barras, Claude
    INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 2169 - 2172
  • [22] Estimation of fundamental frequency of speech using microphone array
    Tanigawa, Shinichi
    Kikuchi, Takafumi
    Yamaoka, Tateo
    Hamada, Nozomu
    Conference Record of the Asilomar Conference on Signals, Systems and Computers, 1999, 2 : 1115 - 1119
  • [23] A NEW ALGORITHM FOR SPEECH FUNDAMENTAL-FREQUENCY ESTIMATION
    JOVANOVIC, GS
    IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1986, 34 (03): : 626 - 630
  • [24] Determination of instantaneous fundamental frequency of speech signals using variational mode decomposition
    Upadhyay, Abhay
    Sharma, Manish
    Pachori, Ram Bilas
    COMPUTERS & ELECTRICAL ENGINEERING, 2017, 62 : 630 - 647
  • [25] SEGMENTATION OF SPEECH INTO VOICED SOUNDS, UNVOICED SOUNDS, AND SILENCE
    LOCHBAUM, CC
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1960, 32 (07): : 914 - 914
  • [26] On time-frequency masking in voiced speech
    Skoglund, J
    Kleijn, WB
    IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2000, 8 (04): : 361 - 369
  • [27] Sinusoidal modeling for nonstationary voiced speech based on a local vector transform
    Ito, Masashi
    Yano, Masafumi
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2007, 121 (03): : 1717 - 1727
  • [28] SPARSE REPRESENTATION AND EPOCH ESTIMATION OF VOICED SPEECH
    Gunther, Jake
    Moon, Todd
    2013 IEEE WORKSHOP ON APPLICATIONS OF SIGNAL PROCESSING TO AUDIO AND ACOUSTICS (WASPAA), 2013,
  • [29] AUTOMATIC ESTIMATION OF FORMANT FREQUENCIES FOR VOICED SPEECH
    SCHAFER, RW
    RABINER, LR
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1970, 47 (1P1): : 94 - &
  • [30] AN INVESTIGATION INTO INSTANTANEOUS FREQUENCY ESTIMATION METHODS FOR IMPROVED SPEECH RECOGNITION FEATURES
    Nayak, Shekhar
    Bhati, Saurabhchand
    Murty, K. Sri Rama
    2017 IEEE GLOBAL CONFERENCE ON SIGNAL AND INFORMATION PROCESSING (GLOBALSIP 2017), 2017, : 363 - 367