Direct F0 Estimation with Neural-Network-based Regression

被引：5

作者：

Xu, Shuzhuang ^{[1
]}

Shimodaira, Hiroshi ^{[2
]}

机构：

[1] Univ Edinburgh, Sch Informat, Edinburgh, Midlothian, Scotland

[2] Univ Edinburgh, Ctr Speech Technol Res, Edinburgh, Midlothian, Scotland

来源：

INTERSPEECH 2019 | 2019年

关键词：

fundamental frequency; pitch tracking; neural network; PITCH; TRACKING;

D O I：

10.21437/Interspeech.2019-3267

中图分类号：

R36 [病理学]; R76 [耳鼻咽喉科学];

学科分类号：

100104 ; 100213 ;

摘要：

Pitch tracking, or the continuous extraction of fundamental frequency from speech waveforms, is of vital importance to many applications in speech analysis and synthesis. Many existing trackers, including conventional ones such as Praat, RAPT and YIN, and newly proposed neural-network-based ones such as DNN-CLS, CREPE and RNN-REG, have conducted an extensive investigation into speech pitch tracking. This work developed a different end-to-end regression model based on neural networks, where a voice detector and a newly proposed value estimator work jointly to highlight the trajectory of fundamental frequency. Experiments on the PTDB-TUG corpus showed that the system surpasses canonical neural networks in terms of gross error rate. It further outperformed conventional trackers under clean condition and neural-network classifiers under noisy condition by the NOISEX-92 corpus.

引用

页码：1995 / 1999

页数：5

共 50 条

[31] A Neural-Network-Based Fault Classifier
Gomez, Laura Rodriguez
Wunderlich, Hans-Joachim
2016 IEEE 25TH ASIAN TEST SYMPOSIUM (ATS), 2016, : 144 - 149
[32] Fuzzy neural-network-based controller
Gücüyener, İsmet
Solid State Phenomena, 2015, 220-221 : 407 - 412
[33] A NEURAL-NETWORK-BASED FUZZY CLASSIFIER
UEBELE, V
ABE, S
LAN, MS
IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS, 1995, 25 (02): : 353 - 361
[34] Communicative F0 generation based on impressions
Shao, Lu
Greenberg, Yoko
Sagisaka, Yoshinori
2014 5TH IEEE CONFERENCE ON COGNITIVE INFOCOMMUNICATIONS (COGINFOCOM), 2014, : 115 - 119
[35] A NEURAL-NETWORK-BASED SPIKE DISCRIMINATOR
OGHALAI, JS
STREET, WN
RHODE, WS
JOURNAL OF NEUROSCIENCE METHODS, 1994, 54 (01) : 9 - 22
[36] Development of an F0 control model based on F0 dynamic characteristics for singing-voice synthesis
Saitou, T
Unoki, M
Akagi, M
SPEECH COMMUNICATION, 2005, 46 (3-4) : 405 - 417
[37] Whisper to Normal Speech Based on Deep Neural Networks with MCC and F0 Features
Lian, Hailun
Hu, Yuting
Zhou, Jian
Wang, Huabin
Tao, Liang
2018 IEEE 23RD INTERNATIONAL CONFERENCE ON DIGITAL SIGNAL PROCESSING (DSP), 2018,
[38] F0 ESTIMATION USING SRH BASED ON TV-CAR SPEECH ANALYSIS
Funaki, Keiichi
Higa, Takehito
2012 PROCEEDINGS OF THE 20TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2012, : 2777 - 2781
[39] An Analogy of F0 Estimation Algorithms Using Sustained Vowel
Karunaimathi, Prarthana, V
Gladis, Dennis
Balakrishnan, D.
PROCEEDING OF THE THIRD INTERNATIONAL SYMPOSIUM ON WOMEN IN COMPUTING AND INFORMATICS (WCI-2015), 2015, : 217 - 221
[40] Investigation of Prosodic F0 Layers in Hierarchical F0 Modeling for HMM-based Speech Synthesis
Lei, Ming
Wu, Yi-Jian
Ling, Zhen-Hua
Dai, Li-Rong
2010 IEEE 10TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING PROCEEDINGS (ICSP2010), VOLS I-III, 2010, : 613 - +

← 1 2 3 4 5 →