Noise robust F0 determination and epoch-marking algorithms

被引:8
|
作者
Kotnik, Bojan [1 ]
Hoege, Harald [2 ]
Kacic, Zdravko [3 ]
机构
[1] ULTRA Doo, Res Ctr Maribor, SI-2000 Maribor, Slovenia
[2] Siemens AG, Corp Technol, Profess Speech Proc, D-81739 Munich, Germany
[3] Univ Maribor, Fac Elect Engn & Comp Sci, SI-2000 Maribor, Slovenia
关键词
Fundamental frequency; Glottal closure instant; Epoch marking; Voicing detection; Artificial neural network; FUNDAMENTAL-FREQUENCY ESTIMATION; PITCH DETERMINATION; EXTRACTION; SPEECH;
D O I
10.1016/j.sigpro.2009.04.017
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
This paper presents a combined pitch frequency (F0) determination and epoch (pitch period) marking procedure CPDMA using merged normalized forward-backward correlation. The algorithm consists of several processing steps: preprocessing of the input speech signal, voicing detection using artificial neural networks, F0 determination stage based on normalized correlation. F0 contour postprocessing applying partial Viterbi traceback, and finally, epoch (or pitch period) marking. To evaluate the proposed CPDMA procedure against any other algorithm, a manually segmented PDA/PMA reference database based on real-life SPEECON Spanish speech database has been created. A set of criteria was proposed to objectively and compactly evaluate the performance of any evaluated PDA/PMA or voicing detection algorithm. The performance of the proposed CPDMA was compared with the performance of well-known and publicly available PRAAT toolkit. The PDA and PMA performances achieved with the proposed CPDMA algorithm significantly outperformed the performance of the PRAAT toolkit in all its three considered configurations: autocorrelation method (PRAAT_AC), cross-correlation method (PRAAT_CC), SHS (PRAAT_SHS), and point process (PRAAT_PP). The superior noise robustness of CPDMA is achieved at the expense of a more complex algorithm and consequently leads to worse real time factor when compared to PRAAT. (C) 2009 Elsevier B.V. All rights reserved.
引用
收藏
页码:2555 / 2569
页数:15
相关论文
共 50 条
  • [41] Robust f0 extraction from monophonic signals using adaptive sub-band filtering
    Rengaswamy, Pradeep
    Reddy, M. Kiran
    Rao, Krothapalli Sreenivasa
    Dasgupta, Pallab
    SPEECH COMMUNICATION, 2020, 116 : 77 - 85
  • [42] Variation of the acoustic parameters: f0, Jitter, Shimmer and Alpha ratio in relation with different background noise levels
    Marsano-Cornejo, Maria Jose
    Roco-Videla, Angel
    ACTA OTORRINOLARINGOLOGICA ESPANOLA, 2023, 74 (04): : 219 - 225
  • [43] Combining F0 and non-negative constraint robust principal component analysis for singing voice separation
    Li, Feng
    Akagi, Masato
    SIGNAL PROCESSING, 2020, 170
  • [44] On how once subtracted dispersion relations lead to a precise determination of ππ scattering and the f0(600) parameters
    Kaminski, Robert
    Garcia-Martin, R.
    Pelaez, J.
    Yndurain, F.
    HADRON 2009, 2010, 1257 : 267 - +
  • [45] F0 Contour Estimation using ELS-based Robust Time-Varying Complex Speech Analysis
    Funaki, Keiichi
    2011 IEEE DIGITAL SIGNAL PROCESSING WORKSHOP AND IEEE SIGNAL PROCESSING EDUCATION WORKSHOP (DSP/SPE), 2011, : 313 - 316
  • [46] Robust F0 Estimation Based on Log-Time Scale Autocorrelation and Its Application to Mandarin Tone Recognition
    Kida, Yusuke
    Sakai, Masaru
    Masuko, Takashi
    Kawamura, Akinori
    INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 2931 - 2934
  • [47] Sequential F0 comparisons between resolved and unresolved harmonics: No evidence for translation noise between two pitch mechanisms
    Micheyl, Christophe
    Oxenham, Andrew J.
    Journal of the Acoustical Society of America, 2004, 116 (05): : 3038 - 3050
  • [48] Sequential F0 comparisons between resolved and unresolved harmonics: No evidence for translation noise between two pitch mechanisms
    Micheyl, C
    Oxenham, AJ
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2004, 116 (05): : 3038 - 3050
  • [49] Combining Atom Decomposition of the F0 Track and HMM-based Phonological Phrase Modelling for Robust Stress Detection in Speech
    Szaszak, Gyorgy
    Tundik, Mate Akos
    Gerazov, Branislav
    Gjoreski, Aleksandar
    SPEECH AND COMPUTER, 2016, 9811 : 165 - 173
  • [50] Roles of Voice Onset Time and F0 in Stop Consonant Voicing Perception: Effects of Masking Noise and Low-Pass Filtering
    Winn, Matthew B.
    Chatterjee, Monita
    Idsardi, William J.
    JOURNAL OF SPEECH LANGUAGE AND HEARING RESEARCH, 2013, 56 (04): : 1097 - 1107