Neural Network Based Pitch Tracking in Very Noisy Speech

被引:73
|
作者
Han, Kun [1 ]
Wang, DeLiang [1 ,2 ]
机构
[1] Ohio State Univ, Dept Comp Sci & Engn, Columbus, OH 43210 USA
[2] Ohio State Univ, Ctr Cognit & Brain Sci, Columbus, OH 43210 USA
关键词
Deep neural networks (DNNs); pitch estimation; recurrent neural networks (RNNs); supervised learning; viterbi decoding; MULTIPITCH TRACKING; ALGORITHM; RECOGNITION; DATABASE; ROBUST;
D O I
10.1109/TASLP.2014.2363410
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Pitch determination is a fundamental problem in speech processing, which has been studied for decades. However, it is challenging to determinate pitch in strong noise because the harmonic structure is corrupted. In this paper, we estimate pitch using supervised learning, where the probabilistic pitch states are directly learned from noisy speech data. We investigate two alternative neural networks modeling pitch state distribution given observations. The first one is a feedforward deep neural network (DNN), which is trained on static frame-level acoustic features. The second one is a recurrent deep neural network (RNN) which is trained on sequential frame-level features and capable of learning temporal dynamics. Both DNNs and RNNs produce accurate probabilistic outputs of pitch states, which are then connected into pitch contours by Viterbi decoding. Our systematic evaluation shows that the proposed pitch tracking algorithms are robust to different noise conditions and can even be applied to reverberant speech. The proposed approach also significantly outperforms other state-of-the-art pitch tracking algorithms.
引用
收藏
页码:2158 / 2168
页数:11
相关论文
共 50 条
  • [1] A multi-pitch tracking algorithm for noisy speech
    Wu, MY
    Wang, DL
    Brown, GJ
    [J]. 2002 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-IV, PROCEEDINGS, 2002, : 369 - 372
  • [2] Noisy Speech Recognition Based On RBF Neural Network
    Yan Gang
    Kong Haidong
    Yu Yang
    Zheng Xiaoxia
    [J]. ADVANCED MATERIALS AND INFORMATION TECHNOLOGY PROCESSING, PTS 1-3, 2011, 271-273 : 597 - 602
  • [3] ROBUST PITCH TRACKING IN NOISY SPEECH USING SPEAKER-DEPENDENT DEEP NEURAL NETWORKS
    Liu, Yuzhou
    Wane, DeLiang
    [J]. 2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 5255 - 5259
  • [4] Neural-network-based HMM adaptation for noisy speech
    Furui, S
    Itoh, D
    [J]. 2001 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-VI, PROCEEDINGS: VOL I: SPEECH PROCESSING 1; VOL II: SPEECH PROCESSING 2 IND TECHNOL TRACK DESIGN & IMPLEMENTATION OF SIGNAL PROCESSING SYSTEMS NEURALNETWORKS FOR SIGNAL PROCESSING; VOL III: IMAGE & MULTIDIMENSIONAL SIGNAL PROCESSING MULTIMEDIA SIGNAL PROCESSING - VOL IV: SIGNAL PROCESSING FOR COMMUNICATIONS; VOL V: SIGNAL PROCESSING EDUCATION SENSOR ARRAY & MULTICHANNEL SIGNAL PROCESSING AUDIO & ELECTROACOUSTICS; VOL VI: SIGNAL PROCESSING THEORY & METHODS STUDENT FORUM, 2001, : 365 - 368
  • [5] Estimation and Tracking of Pitch for Noisy Speech Signals using EMD based Autocorrelation Function Algorithm
    Pratibha, k
    Chandrashekar, H. M.
    [J]. 2017 2ND IEEE INTERNATIONAL CONFERENCE ON RECENT TRENDS IN ELECTRONICS, INFORMATION & COMMUNICATION TECHNOLOGY (RTEICT), 2017, : 2071 - 2075
  • [6] Model-Based Estimation of Instantaneous Pitch in Noisy Speech
    Hong, Jung Ook
    Wolfe, Patrick J.
    [J]. INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 100 - 103
  • [7] Robust Features and Neural Network for Noisy Speech Detection
    Ouzounov, Atanas
    [J]. CYBERNETICS AND INFORMATION TECHNOLOGIES, 2006, 6 (03) : 75 - 84
  • [8] Pitch estimator for noisy speech signals
    Shedied, SA
    Gadalah, ME
    VanLandingham, HF
    [J]. SMC 2000 CONFERENCE PROCEEDINGS: 2000 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN & CYBERNETICS, VOL 1-5, 2000, : 97 - 100
  • [9] INCORPORATING REAL-WORLD NOISY SPEECH IN NEURAL-NETWORK-BASED SPEECH ENHANCEMENT SYSTEMS
    Xia, Yangyang
    Xu, Buye
    Kumar, Anurag
    [J]. 2021 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2021, : 564 - 570
  • [10] Speech Enhancement Algorithm Based on a Convolutional Neural Network Reconstruction of the Temporal Envelope of Speech in Noisy Environments
    Soleymanpour, Rahim
    Soleymanpour, Mohammad
    Brammer, Anthony J.
    Johnson, Michael T.
    Kim, Insoo
    [J]. IEEE ACCESS, 2023, 11 : 5328 - 5336