TUSK: A framework for overviewing the performance of F0 estimators

被引：2

作者：

Morise, Masanori ^{[1
]}

Kawahara, Hideki ^{[2
]}

机构：

[1] Univ Yamanashi, Interdisciplinary Grad Sch, Kofu, Yamanashi, Japan

[2] Wakayama Univ, Fac Engn, Wakayama, Japan

来源：

17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES | 2016年

关键词：

Speech analysis; fundamental frequency; temporal variation; noise robustness; FUNDAMENTAL-FREQUENCY ESTIMATION; PITCH EXTRACTION; TANDEM-STRAIGHT; SPEECH;

D O I：

10.21437/Interspeech.2016-140

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

This article presents a framework for overviewing the performance of fundamental frequency (F0) estimators and evaluates its effectiveness. Over the past few decades, many F0 estimators and evaluation indices have been proposed and have been evaluated using various speech databases. In speech analysis/synthesis research, modem estimators are used as the algorithm to fulfill the demand for high-quality speech synthesis, but at the same time, they are competing with one another on minor issues. Specifically, while all of them meet the demands for high-quality speech synthesis, the result depends on the speech database used in the evaluation. Since there are various types of speech, it is inadvisable to discuss the effectiveness of each estimator on the basis of minor differences. It would be better to select the appropriate F0 estimator in accordance with the speech characteristics. The framework we propose, TUSK, does not rank the estimators but rather attempts to overview them. In TUSK, six parameters are introduced to observe the trends in the characteristics in each F0 estimator. The signal is artificially generated so that six parameters can be controllable independently. In this article, we introduce the concept of TUSK and determine its effectiveness using several modem F0 estimators.

引用

页码：1790 / 1794

页数：5

共 50 条

[41] Isospin breaking and f0(980)-a0(980) mixing in the η(1405) → π0 f0(980) reaction
Aceti, F.
Liang, W. H.
Oset, E.
Wu, J. J.
Zou, B. S.
MENU 2013 - 13TH INTERNATIONAL CONFERENCE MESON-NUCLEON PHYSICS AND THE STRUCTURE OF THE NUCLEON, 2014, 73
[42] Estimation of the radii of the scalar/isoscalar mesons f0(980), f0(1300) and broad state f0(1530+90-250)
Anisovich, VV
Bugg, DV
Sarantsev, AV
PHYSICS LETTERS B, 1998, 437 (1-2) : 209 - 217
[43] F0 and the Perception of Prominence
Mahrt, Tim
Cole, Jennifer
Fleck, Margaret
Hasegawa-Johnson, Mark
13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 2421 - 2424
[44] F0 timing in Kinyarwanda
Myers, S
PHONETICA, 2003, 60 (02) : 71 - 97
[45] Is f0(1710) a glueball?
Janowski, Stanislaus
Giacosa, Francesco
Rischke, Dirk H.
PHYSICAL REVIEW D, 2014, 90 (11)
[46] F0 range instead of F0 slope is the primary cue for the falling tone of Mandarin
Zhang, Wei
Gu, Wentao
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2023, 153 (06): : 3439 - 3446
[47] TRAINING A SUPRA-SEGMENTAL PARAMETRIC F0 MODEL WITHOUT INTERPOLATING F0
Latorre, Javier
Gales, Mark J. F.
Knill, Kate
Akamine, Masami
2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 6880 - 6884
[48] f0(500), f0(980), and a0(980) production in the χc1 → ηπ+π- reaction
Liang, Wei-Hong
Xie, Ju-Jun
Oset, Eulogio
EUROPEAN PHYSICAL JOURNAL C, 2016, 76 (12):
[49] Precise dispersive analysis of the f0(600) and f0(980) resonances from ππ scattering
Ruiz de Elvira, J.
Garcia Martin, R.
Kaminski, R.
Pelaez, J. R.
NUCLEAR PHYSICS B-PROCEEDINGS SUPPLEMENTS, 2010, 207-08 : 173 - 176
[50] Phenomenological studies on the Bd,s0 → J/ψf0(500) [f0(980)] decays
Liu, Xin
Zou, Zhi-Tian
Li, Ying
Xiao, Zhen-Jun
PHYSICAL REVIEW D, 2019, 100 (01)

← 1 2 3 4 5 →