Instantaneous Fundamental Frequency Estimation With Optimal Segmentation for Nonstationary Voiced Speech

被引：22

作者：

Norholm, Sidsel Marie ^{[1
]}

Jensen, Jesper Rindom ^{[1
]}

Christensen, Mads Graesboll ^{[1
]}

机构：

[1] Aalborg Univ, Audio Anal Lab, Architecture Design & Media Technol, DK-9000 Aalborg, Denmark

来源：

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2016年 / 24卷 / 12期

关键词：

Harmonic chirp model; parameter estimation; prewhitening; segmentation; PARAMETER-ESTIMATION; NOISE; ENHANCEMENT; TRACKING; SIGNAL;

D O I：

10.1109/TASLP.2016.2608948

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

In speech processing, the speech is often considered stationary within segments of 20-30 ms even though it is well known not to be true. In this paper, we take the nonstationarity of voiced speech into account by using a linear chirp model to describe the speech signal. We propose a maximum likelihood estimator of the fundamental frequency and chirp rate of this model, and show that it reaches the Cramer-Rao lower bound. Since the speech varies over time, a fixed segment length is not optimal, and we propose making a segmentation of the signal based on the maximum a posteriori criterion. Using this segmentation method, the segments are on average longer for the chirp model compared to the traditional harmonic model. For the signal under test, the average segment length is 24.4 and 17.1 ms for the chirp model and traditional harmonic model, respectively. This suggests a better fit of the chirp model than the harmonic model to the speech signal. The methods are based on an assumption of white Gaussian noise, and, therefore, two prewhitening filters are also proposed.

引用

页码：2354 / 2367

页数：14

共 50 条

[41] NONTACTILE ESTIMATION OF GLOTTAL EXCITATION CHARACTERISTICS OF VOICED SPEECH
BRIESEMAN, NP
THORPE, CW
BATES, RHT
IEE PROCEEDINGS-A-SCIENCE MEASUREMENT AND TECHNOLOGY, 1987, 134 (10): : 807 - 813
[42] Joint estimation of the voiced component and spectrum of a speech signal
Holmes, WH
Malik, N
GLOBECOM 98: IEEE GLOBECOM 1998 - CONFERENCE RECORD, VOLS 1-6: THE BRIDGE TO GLOBAL INTEGRATION, 1998, : 1315 - 1319
[43] A Tandem Algorithm for Pitch Estimation and Voiced Speech Segregation
Hu, Guoning
Wang, DeLiang
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2010, 18 (08): : 2067 - 2079
[44] LDV-system based on optimal estimation of instantaneous Doppler frequency
Sobolev, VS
Shcherbachenko, AM
Kashcheeva, GA
Utkin, EN
Stolpovsky, AA
Skurlatov, AI
Filimonenko, IV
SEVENTH INTERNATIONAL SYMPOSIUM ON LASER METROLOGY APPLIED TO SCIENCE, INDUSTRY, AND EVERYDAY LIFE, PTS 1 AND 2, 2002, 4900 : 1171 - 1177
[45] Optimal Time-Frequency Distribution for Instantaneous Frequency Estimation of Signals With Known IF Patterns
Seddighi, Zahra
Taban, Mohammad Reza
Gazor, Saeed
IEEE TRANSACTIONS ON AEROSPACE AND ELECTRONIC SYSTEMS, 2024, 60 (04) : 5458 - 5466
[46] Joint estimation of the voiced component and spectrum of a speech signal
Holmes, W.Harvey
Malik, Najam
Conference Record / IEEE Global Telecommunications Conference, 1998, 3 : 1315 - 1319
[47] HARMONICS ESTIMATION BASED ON INSTANTANEOUS FREQUENCY AND ITS APPLICATION TO PITCH DETERMINATION OF SPEECH
ABE, T
KOBAYASHI, T
IMAI, S
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 1995, E78D (09) : 1188 - 1194
[48] Joint fundamental frequency and order estimation using optimal filtering
Christensen, Mads Graesboll
Hojvang, Jesper Lisby
Jakobsson, Andreas
Jensen, Soren Holdt
EURASIP JOURNAL ON ADVANCES IN SIGNAL PROCESSING, 2011,
[49] A SINGLE SNAPSHOT OPTIMAL FILTERING METHOD FOR FUNDAMENTAL FREQUENCY ESTIMATION
Jensen, Jesper Rindom
Christensen, Mads Graesboll
Jensen, Soren Holdt
2011 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2011, : 4272 - 4275
[50] Joint fundamental frequency and order estimation using optimal filtering
Mads Græsbøll Christensen
Jesper Lisby Højvang
Andreas Jakobsson
Søren Holdt Jensen
EURASIP Journal on Advances in Signal Processing, 2011

← 1 2 3 4 5 →