Instantaneous Fundamental Frequency Estimation With Optimal Segmentation for Nonstationary Voiced Speech

被引:22
|
作者
Norholm, Sidsel Marie [1 ]
Jensen, Jesper Rindom [1 ]
Christensen, Mads Graesboll [1 ]
机构
[1] Aalborg Univ, Audio Anal Lab, Architecture Design & Media Technol, DK-9000 Aalborg, Denmark
关键词
Harmonic chirp model; parameter estimation; prewhitening; segmentation; PARAMETER-ESTIMATION; NOISE; ENHANCEMENT; TRACKING; SIGNAL;
D O I
10.1109/TASLP.2016.2608948
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In speech processing, the speech is often considered stationary within segments of 20-30 ms even though it is well known not to be true. In this paper, we take the nonstationarity of voiced speech into account by using a linear chirp model to describe the speech signal. We propose a maximum likelihood estimator of the fundamental frequency and chirp rate of this model, and show that it reaches the Cramer-Rao lower bound. Since the speech varies over time, a fixed segment length is not optimal, and we propose making a segmentation of the signal based on the maximum a posteriori criterion. Using this segmentation method, the segments are on average longer for the chirp model compared to the traditional harmonic model. For the signal under test, the average segment length is 24.4 and 17.1 ms for the chirp model and traditional harmonic model, respectively. This suggests a better fit of the chirp model than the harmonic model to the speech signal. The methods are based on an assumption of white Gaussian noise, and, therefore, two prewhitening filters are also proposed.
引用
收藏
页码:2354 / 2367
页数:14
相关论文
共 50 条
  • [41] NONTACTILE ESTIMATION OF GLOTTAL EXCITATION CHARACTERISTICS OF VOICED SPEECH
    BRIESEMAN, NP
    THORPE, CW
    BATES, RHT
    IEE PROCEEDINGS-A-SCIENCE MEASUREMENT AND TECHNOLOGY, 1987, 134 (10): : 807 - 813
  • [42] Joint estimation of the voiced component and spectrum of a speech signal
    Holmes, WH
    Malik, N
    GLOBECOM 98: IEEE GLOBECOM 1998 - CONFERENCE RECORD, VOLS 1-6: THE BRIDGE TO GLOBAL INTEGRATION, 1998, : 1315 - 1319
  • [43] A Tandem Algorithm for Pitch Estimation and Voiced Speech Segregation
    Hu, Guoning
    Wang, DeLiang
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2010, 18 (08): : 2067 - 2079
  • [44] LDV-system based on optimal estimation of instantaneous Doppler frequency
    Sobolev, VS
    Shcherbachenko, AM
    Kashcheeva, GA
    Utkin, EN
    Stolpovsky, AA
    Skurlatov, AI
    Filimonenko, IV
    SEVENTH INTERNATIONAL SYMPOSIUM ON LASER METROLOGY APPLIED TO SCIENCE, INDUSTRY, AND EVERYDAY LIFE, PTS 1 AND 2, 2002, 4900 : 1171 - 1177
  • [45] Optimal Time-Frequency Distribution for Instantaneous Frequency Estimation of Signals With Known IF Patterns
    Seddighi, Zahra
    Taban, Mohammad Reza
    Gazor, Saeed
    IEEE TRANSACTIONS ON AEROSPACE AND ELECTRONIC SYSTEMS, 2024, 60 (04) : 5458 - 5466
  • [46] Joint estimation of the voiced component and spectrum of a speech signal
    Holmes, W.Harvey
    Malik, Najam
    Conference Record / IEEE Global Telecommunications Conference, 1998, 3 : 1315 - 1319
  • [47] HARMONICS ESTIMATION BASED ON INSTANTANEOUS FREQUENCY AND ITS APPLICATION TO PITCH DETERMINATION OF SPEECH
    ABE, T
    KOBAYASHI, T
    IMAI, S
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 1995, E78D (09) : 1188 - 1194
  • [48] Joint fundamental frequency and order estimation using optimal filtering
    Christensen, Mads Graesboll
    Hojvang, Jesper Lisby
    Jakobsson, Andreas
    Jensen, Soren Holdt
    EURASIP JOURNAL ON ADVANCES IN SIGNAL PROCESSING, 2011,
  • [49] A SINGLE SNAPSHOT OPTIMAL FILTERING METHOD FOR FUNDAMENTAL FREQUENCY ESTIMATION
    Jensen, Jesper Rindom
    Christensen, Mads Graesboll
    Jensen, Soren Holdt
    2011 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2011, : 4272 - 4275
  • [50] Joint fundamental frequency and order estimation using optimal filtering
    Mads Græsbøll Christensen
    Jesper Lisby Højvang
    Andreas Jakobsson
    Søren Holdt Jensen
    EURASIP Journal on Advances in Signal Processing, 2011