Direct F0 Estimation with Neural-Network-based Regression

被引:5
|
作者
Xu, Shuzhuang [1 ]
Shimodaira, Hiroshi [2 ]
机构
[1] Univ Edinburgh, Sch Informat, Edinburgh, Midlothian, Scotland
[2] Univ Edinburgh, Ctr Speech Technol Res, Edinburgh, Midlothian, Scotland
来源
关键词
fundamental frequency; pitch tracking; neural network; PITCH; TRACKING;
D O I
10.21437/Interspeech.2019-3267
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
Pitch tracking, or the continuous extraction of fundamental frequency from speech waveforms, is of vital importance to many applications in speech analysis and synthesis. Many existing trackers, including conventional ones such as Praat, RAPT and YIN, and newly proposed neural-network-based ones such as DNN-CLS, CREPE and RNN-REG, have conducted an extensive investigation into speech pitch tracking. This work developed a different end-to-end regression model based on neural networks, where a voice detector and a newly proposed value estimator work jointly to highlight the trajectory of fundamental frequency. Experiments on the PTDB-TUG corpus showed that the system surpasses canonical neural networks in terms of gross error rate. It further outperformed conventional trackers under clean condition and neural-network classifiers under noisy condition by the NOISEX-92 corpus.
引用
收藏
页码:1995 / 1999
页数:5
相关论文
共 50 条
  • [21] Multiband statistical learning for F0 estimation in speech
    Sha, F
    Burgoyne, JA
    Saul, LK
    2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL V, PROCEEDINGS: DESIGN AND IMPLEMENTATION OF SIGNAL PROCESSING SYSTEMS INDUSTRY TECHNOLOGY TRACKS MACHINE LEARNING FOR SIGNAL PROCESSING MULTIMEDIA SIGNAL PROCESSING SIGNAL PROCESSING FOR EDUCATION, 2004, : 661 - 664
  • [22] Neural-network-based observer for real-time tipover estimation
    Meghdari, A
    Naderi, D
    Alam, MR
    MECHATRONICS, 2005, 15 (08) : 989 - 1004
  • [23] Neural-Network-based State Estimation: the effect of Pseudo-measurements
    Bragantini, Andrea
    Baroli, Davide
    Posada-Moreno, Andres Felipe
    Benigni, Andrea
    PROCEEDINGS OF 2021 IEEE 30TH INTERNATIONAL SYMPOSIUM ON INDUSTRIAL ELECTRONICS (ISIE), 2021,
  • [24] F0 ESTIMATION FOR DNN-BASED ULTRASOUND SILENT SPEECH INTERFACES
    Grosz, Tamas
    Gosztolya, Gabor
    Toth, Laszlo
    Csapo, Tamas Gabor
    Marko, Alexandra
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 291 - 295
  • [25] Neural-network-based Power System State Estimation with Extended Observability
    Guanyu Tian
    Yingzhong Gu
    Di Shi
    Jing Fu
    Zhe Yu
    Qun Zhou
    JournalofModernPowerSystemsandCleanEnergy, 2021, 9 (05) : 1043 - 1053
  • [26] Neural-network-based MDG and Optical SNR Estimation in SDM Transmission
    Ospina, Ruby S. B.
    van den Hout, Menno
    van der Heide, Sjoerd
    Okonkwo, Chigo
    Mello, Darli A. A.
    2021 OPTICAL FIBER COMMUNICATIONS CONFERENCE AND EXPOSITION (OFC), 2021,
  • [27] A Study of F0 Estimation Based on RAPT Framework using Sustained Vowel
    Karunaimathi, Prarthana, V
    Gladis, Dennis
    Dalvi, Usha
    2015 INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING, COMMUNICATIONS AND INFORMATICS (ICACCI), 2015, : 2290 - 2295
  • [28] Neural-network-based Power System State Estimation with Extended Observability
    Tian, Guanyu
    Gu, Yingzhong
    Shi, Di
    Fu, Jing
    Yu, Zhe
    Zhou, Qun
    JOURNAL OF MODERN POWER SYSTEMS AND CLEAN ENERGY, 2021, 9 (05) : 1043 - 1053
  • [29] The USTC System for Voice Conversion Challenge 2016: Neural Network Based Approaches for Spectrum, Aperiodicity and F0 Conversion
    Chen, Ling-Hui
    Liu, Li-Juan
    Ling, Zhen-Hua
    Jiang, Yuan
    Dai, Li-Rong
    17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 1642 - 1646
  • [30] A neural-network-based detection of epilepsy
    Nigam, VP
    Graupe, D
    NEUROLOGICAL RESEARCH, 2004, 26 (01) : 55 - 60