Noise robust F0 determination and epoch-marking algorithms

被引：8

作者：

Kotnik, Bojan ^{[1
]}

Hoege, Harald ^{[2
]}

Kacic, Zdravko ^{[3
]}

机构：

[1] ULTRA Doo, Res Ctr Maribor, SI-2000 Maribor, Slovenia

[2] Siemens AG, Corp Technol, Profess Speech Proc, D-81739 Munich, Germany

[3] Univ Maribor, Fac Elect Engn & Comp Sci, SI-2000 Maribor, Slovenia

来源：

SIGNAL PROCESSING | 2009年 / 89卷 / 12期

关键词：

Fundamental frequency; Glottal closure instant; Epoch marking; Voicing detection; Artificial neural network; FUNDAMENTAL-FREQUENCY ESTIMATION; PITCH DETERMINATION; EXTRACTION; SPEECH;

D O I：

10.1016/j.sigpro.2009.04.017

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

This paper presents a combined pitch frequency (F0) determination and epoch (pitch period) marking procedure CPDMA using merged normalized forward-backward correlation. The algorithm consists of several processing steps: preprocessing of the input speech signal, voicing detection using artificial neural networks, F0 determination stage based on normalized correlation. F0 contour postprocessing applying partial Viterbi traceback, and finally, epoch (or pitch period) marking. To evaluate the proposed CPDMA procedure against any other algorithm, a manually segmented PDA/PMA reference database based on real-life SPEECON Spanish speech database has been created. A set of criteria was proposed to objectively and compactly evaluate the performance of any evaluated PDA/PMA or voicing detection algorithm. The performance of the proposed CPDMA was compared with the performance of well-known and publicly available PRAAT toolkit. The PDA and PMA performances achieved with the proposed CPDMA algorithm significantly outperformed the performance of the PRAAT toolkit in all its three considered configurations: autocorrelation method (PRAAT_AC), cross-correlation method (PRAAT_CC), SHS (PRAAT_SHS), and point process (PRAAT_PP). The superior noise robustness of CPDMA is achieved at the expense of a more complex algorithm and consequently leads to worse real time factor when compared to PRAAT. (C) 2009 Elsevier B.V. All rights reserved.

引用

页码：2555 / 2569

页数：15

共 50 条

[41] Robust f0 extraction from monophonic signals using adaptive sub-band filtering
Rengaswamy, Pradeep
Reddy, M. Kiran
Rao, Krothapalli Sreenivasa
Dasgupta, Pallab
SPEECH COMMUNICATION, 2020, 116 : 77 - 85
[42] Variation of the acoustic parameters: f0, Jitter, Shimmer and Alpha ratio in relation with different background noise levels
Marsano-Cornejo, Maria Jose
Roco-Videla, Angel
ACTA OTORRINOLARINGOLOGICA ESPANOLA, 2023, 74 (04): : 219 - 225
[43] Combining F0 and non-negative constraint robust principal component analysis for singing voice separation
Li, Feng
Akagi, Masato
SIGNAL PROCESSING, 2020, 170
[44] On how once subtracted dispersion relations lead to a precise determination of ππ scattering and the f0(600) parameters
Kaminski, Robert
Garcia-Martin, R.
Pelaez, J.
Yndurain, F.
HADRON 2009, 2010, 1257 : 267 - +
[45] F0 Contour Estimation using ELS-based Robust Time-Varying Complex Speech Analysis
Funaki, Keiichi
2011 IEEE DIGITAL SIGNAL PROCESSING WORKSHOP AND IEEE SIGNAL PROCESSING EDUCATION WORKSHOP (DSP/SPE), 2011, : 313 - 316
[46] Robust F0 Estimation Based on Log-Time Scale Autocorrelation and Its Application to Mandarin Tone Recognition
Kida, Yusuke
Sakai, Masaru
Masuko, Takashi
Kawamura, Akinori
INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 2931 - 2934
[47] Sequential F0 comparisons between resolved and unresolved harmonics: No evidence for translation noise between two pitch mechanisms
Micheyl, Christophe
Oxenham, Andrew J.
Journal of the Acoustical Society of America, 2004, 116 (05): : 3038 - 3050
[48] Sequential F0 comparisons between resolved and unresolved harmonics: No evidence for translation noise between two pitch mechanisms
Micheyl, C
Oxenham, AJ
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2004, 116 (05): : 3038 - 3050
[49] Combining Atom Decomposition of the F0 Track and HMM-based Phonological Phrase Modelling for Robust Stress Detection in Speech
Szaszak, Gyorgy
Tundik, Mate Akos
Gerazov, Branislav
Gjoreski, Aleksandar
SPEECH AND COMPUTER, 2016, 9811 : 165 - 173
[50] Roles of Voice Onset Time and F0 in Stop Consonant Voicing Perception: Effects of Masking Noise and Low-Pass Filtering
Winn, Matthew B.
Chatterjee, Monita
Idsardi, William J.
JOURNAL OF SPEECH LANGUAGE AND HEARING RESEARCH, 2013, 56 (04): : 1097 - 1107

← 1 2 3 4 5 →