A Fast Method for High-Resolution Voiced/Unvoiced Detection and Glottal Closure/Opening Instant Estimation of Speech

被引:30
|
作者
Koutrouvelis, Andreas I. [1 ]
Kafentzis, George P. [2 ]
Gaubitch, Nikolay D. [3 ]
Heusdens, Richard [4 ]
机构
[1] Delft Univ Technol, Microelect Dept, NL-2628 CD Delft, Netherlands
[2] Univ Crete, Dept Comp Sci, Iraklion 73000, Greece
[3] Delft Univ Technol, Dept Comp Sci, NL-2628 CD Delft, Netherlands
[4] Delft Univ Technol, Fac Elect Engn Math & Comp Sci, NL-2628 CD Delft, Netherlands
关键词
Glottal closure instants (GCIs); glottal opening instants (GOIs); pitch estimation; speech analysis; voiced/unvoiced detection (VUD); LINEAR PREDICTION; EPOCH EXTRACTION; WAVE-FORM; CLOSURE INSTANTS; CLASSIFICATION; EXCITATION; RECOGNITION; ALGORITHM; QUALITY;
D O I
10.1109/TASLP.2015.2506263
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
We propose a fast speech analysis method which simultaneously performs high-resolution voiced/unvoiced detection (VUD) and accurate estimation of glottal closure and glottal opening instants (GCIs and GOIs, respectively). The proposed algorithm exploits the structure of the glottal flow derivative in order to estimate GCIs and GOIs only in voiced speech using simple time-domain criteria. We compare our method with well-known GCI/GOI methods, namely, the dynamic programming projected phase-slope algorithm (DYPSA), the yet another GCI/GOI algorithm (YAGA) and the speech event detection using the residual excitation and a mean-based signal (SEDREAMS). Furthermore, we examine the performance of the aforementioned methods when combined with state-of-the-art VUD algorithms, namely, the robust algorithm for pitch tracking (RAPT) and the summation of residual harmonics (SRH). Experiments conducted on the APLAWD and SAM databases show that the proposed algorithm outperforms the state-of-the-art combinations of VUD and GCI/GOI algorithms with respect to almost all evaluation criteria for clean speech. Experiments on speech contaminated with several noise types (white Gaussian, babble, and car-interior) are also presented and discussed. The proposed algorithm outperforms the state-of-the-art combinations in most evaluation criteria for signal-to-noise ratio greater than 10 dB.
引用
收藏
页码:316 / 328
页数:13
相关论文
共 50 条
  • [1] Precise glottal closure instant detector for voiced speech
    Hahn, M
    Kang, DG
    ELECTRONICS LETTERS, 1996, 32 (23) : 2117 - 2118
  • [2] Glottal Closure and Opening Instant Detection from Speech Signals
    Drugman, Thomas
    Dutoit, Thierry
    INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 2859 - 2862
  • [3] The DYPSA algorithm for estimation of glottal closure instants in voiced speech
    Kounoudes, A
    Naylor, PA
    Brookes, M
    2002 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-IV, PROCEEDINGS, 2002, : 349 - 352
  • [4] Estimation of glottal closure instants in voiced speech using the DYPSA algorithm
    Naylor, Patrick A.
    Kounoudes, Anastasis
    Gudnason, Jon
    Brookes, Mike
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2007, 15 (01): : 34 - 43
  • [5] Robust and High-resolution Voiced/Unvoiced Classification in Noisy Speech Using A Signal Smoothness Criterion
    Murthy, A. Sreenivasa
    Sekhar, S. Chandra
    Sreenivas, T. V.
    INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 2260 - 2263
  • [6] Estimation of Glottal Closing and Opening Instants in Voiced Speech Using the YAGA Algorithm
    Thomas, Mark R. P.
    Gudnason, Jon
    Naylor, Patrick A.
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2012, 20 (01): : 82 - 91
  • [7] Multiscale product of electroglottogram signal for glottal closure and opening instant detection
    Bouzid, A.
    Ellouze, N.
    2006 IMACS: MULTICONFERENCE ON COMPUTATIONAL ENGINEERING IN SYSTEMS APPLICATIONS, VOLS 1 AND 2, 2006, : 106 - +
  • [8] HIGH-RESOLUTION SINUSOIDAL MODELING OF UNVOICED SPEECH
    Kafentzis, George P.
    Stylianou, Yannis
    2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 4985 - 4989
  • [9] Context-Aware XGBoost for Glottal Closure Instant Detection in Speech Signal
    Matousek, Jindrich
    Vrastil, Michal
    TEXT, SPEECH, AND DIALOGUE (TSD 2020), 2020, 12284 : 446 - 455
  • [10] The Estimation Of Glottal Closure Instants In Voiced Speech Using Fractional B-Spline Wavelets
    Emerich, Simina
    Lupu, Eugen
    Apatean, Anca
    ANALYSIS OF BIOMEDICAL SIGNALS AND IMAGES, 2008, : 537 - 540