Instantaneous Fundamental Frequency Estimation With Optimal Segmentation for Nonstationary Voiced Speech

被引:22
|
作者
Norholm, Sidsel Marie [1 ]
Jensen, Jesper Rindom [1 ]
Christensen, Mads Graesboll [1 ]
机构
[1] Aalborg Univ, Audio Anal Lab, Architecture Design & Media Technol, DK-9000 Aalborg, Denmark
关键词
Harmonic chirp model; parameter estimation; prewhitening; segmentation; PARAMETER-ESTIMATION; NOISE; ENHANCEMENT; TRACKING; SIGNAL;
D O I
10.1109/TASLP.2016.2608948
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In speech processing, the speech is often considered stationary within segments of 20-30 ms even though it is well known not to be true. In this paper, we take the nonstationarity of voiced speech into account by using a linear chirp model to describe the speech signal. We propose a maximum likelihood estimator of the fundamental frequency and chirp rate of this model, and show that it reaches the Cramer-Rao lower bound. Since the speech varies over time, a fixed segment length is not optimal, and we propose making a segmentation of the signal based on the maximum a posteriori criterion. Using this segmentation method, the segments are on average longer for the chirp model compared to the traditional harmonic model. For the signal under test, the average segment length is 24.4 and 17.1 ms for the chirp model and traditional harmonic model, respectively. This suggests a better fit of the chirp model than the harmonic model to the speech signal. The methods are based on an assumption of white Gaussian noise, and, therefore, two prewhitening filters are also proposed.
引用
收藏
页码:2354 / 2367
页数:14
相关论文
共 50 条
  • [31] Voiced speech as response of a self-consistent fundamental drive
    Drepper, Friedhelm R.
    SPEECH COMMUNICATION, 2007, 49 (03) : 186 - 200
  • [32] Experience with a second language affects the use of fundamental frequency in speech segmentation
    Tremblay, Annie
    Namjoshi, Jui
    Spinelli, Elsa
    Broersma, Mirjam
    Cho, Taehong
    Kim, Sahyang
    Martinez-Garcia, Maria Teresa
    Connell, Katrina
    PLOS ONE, 2017, 12 (07):
  • [33] Estimation of the instantaneous pitch of speech
    Resch, Barbara
    Nilsson, Mattias
    Ekman, Anders
    Kleijn, W. Bastiaan
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2007, 15 (03): : 813 - 822
  • [34] Fundamental frequency estimation of speech signals using MUSIC algorithm
    Murakami, Takahiro
    Ishida, Yoshihisa
    Acoustical Science and Technology, 2001, 22 (04) : 293 - 297
  • [35] Estimation of the fundamental frequency of the speech signal modeled by the SYMPES method
    Milivojevic, Zoran N.
    Mirkovic, Milorad Dj.
    AEU-INTERNATIONAL JOURNAL OF ELECTRONICS AND COMMUNICATIONS, 2009, 63 (03) : 200 - 208
  • [36] Performance of an Event-Based Instantaneous Fundamental Frequency Estimator for Distant Speech Signals
    Seshadri, Guruprasad
    Yegnanarayana, B.
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2011, 19 (07): : 1853 - 1864
  • [37] PROGRAMS FOR THE ESTIMATION OF FUNDAMENTAL-FREQUENCY, AMPLITUDE, AND VOICING OF SPEECH
    HEYMAN, R
    BIRD, RJ
    HEYMAN, RL
    HARDING, J
    BEHAVIOR RESEARCH METHODS & INSTRUMENTATION, 1981, 13 (06): : 760 - 760
  • [38] Filtering of a dissonant frequency based on improved fundamental frequency estimation for speech enhancement
    Jeon, B
    Kang, S
    Baek, SJ
    Sung, KM
    IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES, 2003, E86A (08) : 2063 - 2064
  • [39] VARIATIONS OF THE FUNDAMENTAL FREQUENCY IN POLISH VOICED CONSONANTS.
    Matuszkina, Olga
    1978, 3 (02): : 105 - 119
  • [40] Source-filter separation for nonstationary voiced speech based on sinusoidal representation
    Ito, Masashi
    Ohara, Keiji
    Ito, Akinori
    Yano, Masafumi
    ACOUSTICAL SCIENCE AND TECHNOLOGY, 2010, 31 (02) : 181 - 184