F0 Estimation and Voicing Detection With Cascade Architecture in Noisy Speech

被引:2
|
作者
Zhang, Yixuan [1 ]
Wang, Heming [1 ]
Wang, Deliang [2 ,3 ]
机构
[1] Ohio State Univ, Dept Comp Sci & Engn, Columbus, OH 43210 USA
[2] Ohio State Univ, Dept Comp Sci & Engn, Columbus, OH 43210 USA
[3] Ohio State Univ, Ctr Cognit & Brain Sci, Columbus, OH 43210 USA
关键词
Estimation; Noise measurement; Multitasking; Speech enhancement; Convolution; Training; Speech processing; Complex domain processing; densely-connected convolutional recurrent neural network; multi-task learning; neural cascade architecture; pitch tracking; voicing detection; MULTIPITCH TRACKING; PITCH; ALGORITHM; MASKING; ROBUST;
D O I
10.1109/TASLP.2023.3313427
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
As a fundamental problem in speech processing, pitch tracking has been studied for decades. While strong performance has been achieved on clean speech, pitch tracking in noisy speech is still challenging. Severe non-stationary noises not only corrupt the harmonic structure in voiced intervals but also make it difficult to determine the existence of voiced speech. Given the importance of voicing detection for pitch tracking, this study proposes a neural cascade architecture that jointly performs pitch estimation and voicing detection. The cascade architecture optimizes a speech enhancement module and a pitch tracking module, and is trained in a speaker-independent and noise-independent way. It is observed that incorporating the enhancement module improves both pitch estimation and voicing detection accuracy, especially in low signal-to-noise ratio (SNR) conditions. In addition, compared with frameworks that combine corresponding single-task models, the proposed multi-task framework achieves better performance and is more efficient. Experimental results show that the proposed method is robust to different noise conditions and substantially outperforms other competitive pitch tracking methods.
引用
收藏
页码:3760 / 3770
页数:11
相关论文
共 50 条
  • [31] Model Counting Meets F0 Estimation
    Pavan, A.
    Vinodchandran, N. V.
    Bhattacharyya, Arnab
    Meel, Kuldeep S.
    ACM TRANSACTIONS ON DATABASE SYSTEMS, 2023, 48 (03):
  • [32] Model Counting meets F0 Estimation
    Pavan, A.
    Vinodchandran, N. V.
    Bhattacharyya, Arnab
    Meel, Kuldeep S.
    PODS '21: PROCEEDINGS OF THE 40TH SIGMOD-SIGACT-SIGAI SYMPOSIUM ON PRINCIPLES OF DATABASE SYSTEMS, 2021, : 299 - 311
  • [33] REDUCING F0 FRAME ERROR OF F0 TRACKING ALGORITHMS UNDER NOISY CONDITIONS WITH AN UNVOICED/VOICED CLASSIFICATION FRONTEND
    Chu, Wei
    Alwan, Abeer
    2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS, 2009, : 3969 - 3972
  • [34] DECLINATION OF FUNDAMENTAL FREQUENCY (F0) IN SPEECH PRODUCTION
    COOPER, WE
    SORENSEN, JM
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1978, 63 : S67 - S67
  • [35] F0 analysis for Japanese conversational speech synthesis
    Nakajima, Hideharu
    Sagisaka, Yoshinori
    2009 EIGHTH INTERNATIONAL SYMPOSIUM ON NATURAL LANGUAGE PROCESSING, PROCEEDINGS, 2009, : 137 - +
  • [36] Speech-in-speech perception: The role of F0, rate, and rhythm
    Fishero, Sheyenne
    Jongman, Allard
    Sereno, Joan
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2023, 153 (03):
  • [37] Multi-Microphone Periodicity Function for Robust F0 Estimation in Real Noisy and Reverberant Environments
    Flego, Federico
    Omologo, Maurizio
    INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 2146 - 2149
  • [38] EFFECTIVENESS OF FUNDAMENTAL FREQUENCY (F0) AND STRENGTH OF EXCITATION (SOE) FOR SPOOFED SPEECH DETECTION
    Patel, Tanvina B.
    Patil, Hemant A.
    2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 5105 - 5109
  • [39] Energy and F0 contour modeling with Functional Data Analysis for Emotional Speech Detection
    Pablo Arias, Juan
    Busso, Carlos
    Becerra Yoma, Nestor
    14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 2870 - 2874
  • [40] Robust F0 estimation using ELS-based robust complex speech analysis
    Funaki, Keiichi
    Kinjo, Tatsuhiko
    IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES, 2008, E91A (03) : 868 - 871