Neural Network Based Pitch Tracking in Very Noisy Speech

被引:73
|
作者
Han, Kun [1 ]
Wang, DeLiang [1 ,2 ]
机构
[1] Ohio State Univ, Dept Comp Sci & Engn, Columbus, OH 43210 USA
[2] Ohio State Univ, Ctr Cognit & Brain Sci, Columbus, OH 43210 USA
关键词
Deep neural networks (DNNs); pitch estimation; recurrent neural networks (RNNs); supervised learning; viterbi decoding; MULTIPITCH TRACKING; ALGORITHM; RECOGNITION; DATABASE; ROBUST;
D O I
10.1109/TASLP.2014.2363410
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Pitch determination is a fundamental problem in speech processing, which has been studied for decades. However, it is challenging to determinate pitch in strong noise because the harmonic structure is corrupted. In this paper, we estimate pitch using supervised learning, where the probabilistic pitch states are directly learned from noisy speech data. We investigate two alternative neural networks modeling pitch state distribution given observations. The first one is a feedforward deep neural network (DNN), which is trained on static frame-level acoustic features. The second one is a recurrent deep neural network (RNN) which is trained on sequential frame-level features and capable of learning temporal dynamics. Both DNNs and RNNs produce accurate probabilistic outputs of pitch states, which are then connected into pitch contours by Viterbi decoding. Our systematic evaluation shows that the proposed pitch tracking algorithms are robust to different noise conditions and can even be applied to reverberant speech. The proposed approach also significantly outperforms other state-of-the-art pitch tracking algorithms.
引用
收藏
页码:2158 / 2168
页数:11
相关论文
共 50 条
  • [41] Multi-band summary correlogram-based pitch detection for noisy speech
    Tan, Lee Ngee
    Alwan, Abeer
    [J]. SPEECH COMMUNICATION, 2013, 55 (7-8) : 841 - 856
  • [42] Pitch Estimation Based on the Cepstrum Analysis by the Multi Scale Product of Clean and Noisy Speech
    Jlassi, Wided
    Bouzid, Aicha
    Ellouze, Noureddine
    [J]. RECENT ADVANCES IN NONLINEAR SPEECH PROCESSING, 2016, 48 : 219 - 225
  • [43] A Robust Pitch Extractor Based on DTW Lines and CASA with Application in Noisy Speech Recognition
    Morales-Cordovilla, Juan A.
    Cabanas-Molero, Pablo
    Peinado, Antonio M.
    Sanchez, Victoria
    [J]. ADVANCES IN SPEECH AND LANGUAGE TECHNOLOGIES FOR IBERIAN LANGUAGES, 2012, 328 : 197 - 206
  • [44] Pitch Detection Method for Noisy Speech Signals Based on Wavelet Transform and Autocorrelation Function
    Li Ru-wei
    Cao Long-tao
    Li Yang
    [J]. 2013 NINTH INTERNATIONAL CONFERENCE ON INTELLIGENT INFORMATION HIDING AND MULTIMEDIA SIGNAL PROCESSING (IIH-MSP 2013), 2013, : 153 - 156
  • [45] Cascaded Convolutional Neural Network Architecture for Speech Emotion Recognition in Noisy Conditions
    Nam, Youngja
    Lee, Chankyu
    [J]. SENSORS, 2021, 21 (13)
  • [46] A novel HMM and wavelet neural network hybrid method for noisy speech recognition
    Lin, SF
    Pan, YX
    Guo, HJ
    [J]. ISTM/2005: 6TH INTERNATIONAL SYMPOSIUM ON TEST AND MEASUREMENT, VOLS 1-9, CONFERENCE PROCEEDINGS, 2005, : 1057 - 1060
  • [47] Robust neural tracking of linguistic speech representations using a convolutional neural network
    Puffay, Corentin
    Vanthornhout, Jonas
    Gillis, Marlies
    Accou, Bernd
    Van Hamme, Hugo
    Francart, Tom
    [J]. JOURNAL OF NEURAL ENGINEERING, 2023, 20 (04)
  • [48] Neural network based detection of heterogeneities in noisy images
    Abramov, S.
    Naumenko, A.
    Lukin, V.
    Krivenko, S.
    Kaluzhinov, I.
    [J]. Telecommunications and Radio Engineering (English translation of Elektrosvyaz and Radiotekhnika), 2020, 79 (19): : 1691 - 1705
  • [49] A MULTIPITCH TRACKING ALGORITHM FOR NOISY AND REVERBERANT SPEECH
    Jin, Zhaozhang
    Wang, DeLiang
    [J]. 2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, : 4218 - 4221
  • [50] A Multiple Functions Multiplication Approach for Pitch Extraction of Noisy Speech
    Rahman, Md. Saifur
    Sugiura, Yosuke
    Shimamura, Tetsuya
    [J]. 2019 10TH INTERNATIONAL CONFERENCE ON SPEECH TECHNOLOGY AND HUMAN-COMPUTER DIALOGUE (SPED), 2019,