Instantaneous Fundamental Frequency Estimation With Optimal Segmentation for Nonstationary Voiced Speech

被引:22
|
作者
Norholm, Sidsel Marie [1 ]
Jensen, Jesper Rindom [1 ]
Christensen, Mads Graesboll [1 ]
机构
[1] Aalborg Univ, Audio Anal Lab, Architecture Design & Media Technol, DK-9000 Aalborg, Denmark
关键词
Harmonic chirp model; parameter estimation; prewhitening; segmentation; PARAMETER-ESTIMATION; NOISE; ENHANCEMENT; TRACKING; SIGNAL;
D O I
10.1109/TASLP.2016.2608948
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In speech processing, the speech is often considered stationary within segments of 20-30 ms even though it is well known not to be true. In this paper, we take the nonstationarity of voiced speech into account by using a linear chirp model to describe the speech signal. We propose a maximum likelihood estimator of the fundamental frequency and chirp rate of this model, and show that it reaches the Cramer-Rao lower bound. Since the speech varies over time, a fixed segment length is not optimal, and we propose making a segmentation of the signal based on the maximum a posteriori criterion. Using this segmentation method, the segments are on average longer for the chirp model compared to the traditional harmonic model. For the signal under test, the average segment length is 24.4 and 17.1 ms for the chirp model and traditional harmonic model, respectively. This suggests a better fit of the chirp model than the harmonic model to the speech signal. The methods are based on an assumption of white Gaussian noise, and, therefore, two prewhitening filters are also proposed.
引用
收藏
页码:2354 / 2367
页数:14
相关论文
共 50 条
  • [1] Event-Based Method for Instantaneous Fundamental Frequency Estimation from Voiced Speech Based on Eigenvalue Decomposition of the Hankel Matrix
    Jain, Pooja
    Pachori, Ram Bilas
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2014, 22 (10) : 1467 - 1482
  • [2] Event-Based Instantaneous Fundamental Frequency Estimation From Speech Signals
    Yegnanarayana, B.
    Murty, K. Sri Rama
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2009, 17 (04): : 614 - 624
  • [3] Instantaneous Fundamental Frequency Estimation from Speech using Fourier Decomposition Method
    Singh, Pushpendra
    Singhal, Amit
    Fatimah, Binish
    2022 IEEE INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING AND COMMUNICATIONS, SPCOM, 2022,
  • [4] NONSTATIONARY SPECTRAL MODELING OF VOICED SPEECH
    ALMEIDA, LB
    TRIBOLET, JM
    IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1983, 31 (03): : 664 - 678
  • [5] 44 Instantaneous fundamental frequency estimation of speech signals using DESA in low-frequency region
    Rathore, Purshottam Singh
    Pachori, Ram Bilas
    2013 INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING AND COMMUNICATION (ICSC), 2013, : 470 - 473
  • [6] Instantaneous fundamental frequency estimation of speech signals using tunable-Q wavelet transform
    Nishad, Anurag
    Pachori, Ram Bilas
    2018 INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING AND COMMUNICATIONS (SPCOM 2018), 2018, : 157 - 161
  • [7] FUNDAMENTAL-FREQUENCY DETERMINATION BASED ON INSTANTANEOUS FREQUENCY ESTIMATION
    QIU, LJ
    YANG, HY
    KOH, SN
    SIGNAL PROCESSING, 1995, 44 (02) : 233 - 241
  • [8] Fundamental frequency estimation based on instantaneous frequency amplitude spectrum
    Tanaka, T
    Kobayashi, T
    Arifianto, D
    Masuko, T
    2002 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-IV, PROCEEDINGS, 2002, : 329 - 332
  • [9] Sequential stream segregation of voiced and unvoiced speech sounds based on fundamental frequency
    David, Marion
    Lavandier, Mathieu
    Grimault, Nicolas
    Oxenham, Andrew J.
    HEARING RESEARCH, 2017, 344 : 235 - 243
  • [10] On the Estimation of Fundamental Frequency From Nonstationary Noisy Speech Signals Based on the Hilbert-Huang Transform
    Zao, L.
    Coelho, R.
    IEEE SIGNAL PROCESSING LETTERS, 2018, 25 (02) : 248 - 252