Direct F0 Estimation with Neural-Network-based Regression

被引:5
|
作者
Xu, Shuzhuang [1 ]
Shimodaira, Hiroshi [2 ]
机构
[1] Univ Edinburgh, Sch Informat, Edinburgh, Midlothian, Scotland
[2] Univ Edinburgh, Ctr Speech Technol Res, Edinburgh, Midlothian, Scotland
来源
关键词
fundamental frequency; pitch tracking; neural network; PITCH; TRACKING;
D O I
10.21437/Interspeech.2019-3267
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
Pitch tracking, or the continuous extraction of fundamental frequency from speech waveforms, is of vital importance to many applications in speech analysis and synthesis. Many existing trackers, including conventional ones such as Praat, RAPT and YIN, and newly proposed neural-network-based ones such as DNN-CLS, CREPE and RNN-REG, have conducted an extensive investigation into speech pitch tracking. This work developed a different end-to-end regression model based on neural networks, where a voice detector and a newly proposed value estimator work jointly to highlight the trajectory of fundamental frequency. Experiments on the PTDB-TUG corpus showed that the system surpasses canonical neural networks in terms of gross error rate. It further outperformed conventional trackers under clean condition and neural-network classifiers under noisy condition by the NOISEX-92 corpus.
引用
收藏
页码:1995 / 1999
页数:5
相关论文
共 50 条
  • [31] A Neural-Network-Based Fault Classifier
    Gomez, Laura Rodriguez
    Wunderlich, Hans-Joachim
    2016 IEEE 25TH ASIAN TEST SYMPOSIUM (ATS), 2016, : 144 - 149
  • [32] Fuzzy neural-network-based controller
    Gücüyener, İsmet
    Solid State Phenomena, 2015, 220-221 : 407 - 412
  • [33] A NEURAL-NETWORK-BASED FUZZY CLASSIFIER
    UEBELE, V
    ABE, S
    LAN, MS
    IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS, 1995, 25 (02): : 353 - 361
  • [34] Communicative F0 generation based on impressions
    Shao, Lu
    Greenberg, Yoko
    Sagisaka, Yoshinori
    2014 5TH IEEE CONFERENCE ON COGNITIVE INFOCOMMUNICATIONS (COGINFOCOM), 2014, : 115 - 119
  • [35] A NEURAL-NETWORK-BASED SPIKE DISCRIMINATOR
    OGHALAI, JS
    STREET, WN
    RHODE, WS
    JOURNAL OF NEUROSCIENCE METHODS, 1994, 54 (01) : 9 - 22
  • [36] Development of an F0 control model based on F0 dynamic characteristics for singing-voice synthesis
    Saitou, T
    Unoki, M
    Akagi, M
    SPEECH COMMUNICATION, 2005, 46 (3-4) : 405 - 417
  • [37] Whisper to Normal Speech Based on Deep Neural Networks with MCC and F0 Features
    Lian, Hailun
    Hu, Yuting
    Zhou, Jian
    Wang, Huabin
    Tao, Liang
    2018 IEEE 23RD INTERNATIONAL CONFERENCE ON DIGITAL SIGNAL PROCESSING (DSP), 2018,
  • [38] F0 ESTIMATION USING SRH BASED ON TV-CAR SPEECH ANALYSIS
    Funaki, Keiichi
    Higa, Takehito
    2012 PROCEEDINGS OF THE 20TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2012, : 2777 - 2781
  • [39] An Analogy of F0 Estimation Algorithms Using Sustained Vowel
    Karunaimathi, Prarthana, V
    Gladis, Dennis
    Balakrishnan, D.
    PROCEEDING OF THE THIRD INTERNATIONAL SYMPOSIUM ON WOMEN IN COMPUTING AND INFORMATICS (WCI-2015), 2015, : 217 - 221
  • [40] Investigation of Prosodic F0 Layers in Hierarchical F0 Modeling for HMM-based Speech Synthesis
    Lei, Ming
    Wu, Yi-Jian
    Ling, Zhen-Hua
    Dai, Li-Rong
    2010 IEEE 10TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING PROCEEDINGS (ICSP2010), VOLS I-III, 2010, : 613 - +