F0 Estimation and Voicing Detection With Cascade Architecture in Noisy Speech

被引：2

作者：

Zhang, Yixuan ^{[1
]}

Wang, Heming ^{[1
]}

Wang, Deliang ^{[2
,3
]}

机构：

[1] Ohio State Univ, Dept Comp Sci & Engn, Columbus, OH 43210 USA

[2] Ohio State Univ, Dept Comp Sci & Engn, Columbus, OH 43210 USA

[3] Ohio State Univ, Ctr Cognit & Brain Sci, Columbus, OH 43210 USA

来源：

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2023年 / 31卷

关键词：

Estimation; Noise measurement; Multitasking; Speech enhancement; Convolution; Training; Speech processing; Complex domain processing; densely-connected convolutional recurrent neural network; multi-task learning; neural cascade architecture; pitch tracking; voicing detection; MULTIPITCH TRACKING; PITCH; ALGORITHM; MASKING; ROBUST;

D O I：

10.1109/TASLP.2023.3313427

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

As a fundamental problem in speech processing, pitch tracking has been studied for decades. While strong performance has been achieved on clean speech, pitch tracking in noisy speech is still challenging. Severe non-stationary noises not only corrupt the harmonic structure in voiced intervals but also make it difficult to determine the existence of voiced speech. Given the importance of voicing detection for pitch tracking, this study proposes a neural cascade architecture that jointly performs pitch estimation and voicing detection. The cascade architecture optimizes a speech enhancement module and a pitch tracking module, and is trained in a speaker-independent and noise-independent way. It is observed that incorporating the enhancement module improves both pitch estimation and voicing detection accuracy, especially in low signal-to-noise ratio (SNR) conditions. In addition, compared with frameworks that combine corresponding single-task models, the proposed multi-task framework achieves better performance and is more efficient. Experimental results show that the proposed method is robust to different noise conditions and substantially outperforms other competitive pitch tracking methods.

引用

页码：3760 / 3770

页数：11

共 50 条

[31] Model Counting Meets F0 Estimation
Pavan, A.
Vinodchandran, N. V.
Bhattacharyya, Arnab
Meel, Kuldeep S.
ACM TRANSACTIONS ON DATABASE SYSTEMS, 2023, 48 (03):
[32] Model Counting meets F0 Estimation
Pavan, A.
Vinodchandran, N. V.
Bhattacharyya, Arnab
Meel, Kuldeep S.
PODS '21: PROCEEDINGS OF THE 40TH SIGMOD-SIGACT-SIGAI SYMPOSIUM ON PRINCIPLES OF DATABASE SYSTEMS, 2021, : 299 - 311
[33] REDUCING F0 FRAME ERROR OF F0 TRACKING ALGORITHMS UNDER NOISY CONDITIONS WITH AN UNVOICED/VOICED CLASSIFICATION FRONTEND
Chu, Wei
Alwan, Abeer
2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS, 2009, : 3969 - 3972
[34] DECLINATION OF FUNDAMENTAL FREQUENCY (F0) IN SPEECH PRODUCTION
COOPER, WE
SORENSEN, JM
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1978, 63 : S67 - S67
[35] F0 analysis for Japanese conversational speech synthesis
Nakajima, Hideharu
Sagisaka, Yoshinori
2009 EIGHTH INTERNATIONAL SYMPOSIUM ON NATURAL LANGUAGE PROCESSING, PROCEEDINGS, 2009, : 137 - +
[36] Speech-in-speech perception: The role of F0, rate, and rhythm
Fishero, Sheyenne
Jongman, Allard
Sereno, Joan
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2023, 153 (03):
[37] Multi-Microphone Periodicity Function for Robust F0 Estimation in Real Noisy and Reverberant Environments
Flego, Federico
Omologo, Maurizio
INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 2146 - 2149
[38] EFFECTIVENESS OF FUNDAMENTAL FREQUENCY (F0) AND STRENGTH OF EXCITATION (SOE) FOR SPOOFED SPEECH DETECTION
Patel, Tanvina B.
Patil, Hemant A.
2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 5105 - 5109
[39] Energy and F0 contour modeling with Functional Data Analysis for Emotional Speech Detection
Pablo Arias, Juan
Busso, Carlos
Becerra Yoma, Nestor
14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 2870 - 2874
[40] Robust F0 estimation using ELS-based robust complex speech analysis
Funaki, Keiichi
Kinjo, Tatsuhiko
IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES, 2008, E91A (03) : 868 - 871

← 1 2 3 4 5 →