Analysing fundamental frequency contours and local speech rate in map task dialogs

被引：17

作者：

Mixdorff, H

Pfitzinger, HR

机构：

[1] TFH Berlin Univ Appl Sci, Dept Comp Sci & Media, D-13353 Berlin, Germany

[2] Univ Munich, Dept Phonet & Speech Commun, D-80799 Munich, Germany

来源：

SPEECH COMMUNICATION | 2005年 / 46卷 / 3-4期

关键词：

Fujisaki model; perceptual local speech rate; F0; contours; map task;

D O I：

10.1016/j.specom.2005.02.019

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

The current paper reports first results from the analysis of task-oriented dialogs using a Fujisaki model-based parameterization of F0 contours, as well as a model of the perceptual local speech rate. Two versions of map task style dialogs were examined: (1) the recordings made during the map task proper, (2) readings from scripts of the original dialogs by the same subjects. The first part of this paper presents an analysis of phrase boundaries with respect to form and function. A second issue is the problem of processing fillers, hesitations and repairs within the framework of the Fujisaki model-based analysis. The second part of the paper describes the comparative analysis of spontaneous and read versions of the same dialog fragments with respect to Fujisaki model parameters, contours of the perceptual local speech rate, and other features. In a perception test we asked listeners to identify the speaking style of dialog fragments. Apparently this was possible only for part of the data. Analysis of accent commands and perceptual local speech rate contours still suggested differences between the two speaking styles. The number of accented syllables, the associated accent commands' amplitudes, and the perceptual local speech rate were generally higher in the read than in the spontaneous utterances. These results were almost significant despite the fact that the read version had been well re-enacted by the subjects and therefore did not exactly exhibit typical reading style characteristics. Despite this drawback, the methodology presented here has strong potential for further comparative prosodic studies of speaking styles. (c) 2005 Elsevier B.V. All rights reserved.

引用

页码：310 / 325

页数：16

共 45 条

[31] Harmonic-Net: Fundamental Frequency and Speech Rate Controllable Fast Neural Vocoder
Matsubara, Keisuke
Okamoto, Takuma
Takashima, Ryoichi
Takiguchi, Tetsuya
Toda, Tomoki
Kawai, Hisashi
[J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2023, 31 : 1902 - 1915
[32] 54 VOICES FROM 2 - EFFECTS OF SIMULTANEOUS MANIPULATIONS OF RATE, MEAN FUNDAMENTAL FREQUENCY, AND VARIANCE OF FUNDAMENTAL FREQUENCY ON RATINGS OF PERSONALITY FROM SPEECH
BROWN, BL
STRONG, WJ
RENCHER, AC
[J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1974, 55 (02): : 313 - 318
[33] Fluency, fundamental frequency, and speech rate under frequency-shifted auditory feedback in stuttering and nonstuttering persons
Natke, U
Kalveram, JGT
[J]. JOURNAL OF FLUENCY DISORDERS, 2001, 26 (03) : 227 - 241
[34] Speech Rate Comparison when Talking to a System and Talking to a Human: A study from a Speech-to-Speech, Machine Translation mediated Map Task
Akira, Hayakawa
Vogel, Carl
Luz, Saturnino
Campbell, Nick
[J]. 18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 3286 - 3290
[35] Prosodic word boundary detection using statistical modeling of moraic fundamental frequency contours and its use for continuous speech recognition
Iwano, Koji
Hirose, Keikichi
[J]. ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, 1999, 1 : 133 - 136
[36] Detection of prosodic word boundaries by statistical modeling of mora transitions of fundamental frequency contours and its use for continuous speech recognition
Hirose, K
Iwano, K
[J]. 2000 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, PROCEEDINGS, VOLS I-VI, 2000, : 1763 - 1766
[37] Applying generation process model constraint to fundamental frequency contours generated by hidden-Markov-model-based speech synthesis
Matsuda, Tetsuya
Hirose, Keikichi
Minematsu, Nobuaki
[J]. ACOUSTICAL SCIENCE AND TECHNOLOGY, 2012, 33 (04) : 221 - 228
[38] Prosodic word boundary detection using statistical modeling of moraic fundamental frequency contours and its use for continuous speech recognition
Iwano, K
Hirose, K
[J]. ICASSP '99: 1999 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, PROCEEDINGS VOLS I-VI, 1999, : 133 - 136
[39] MEASURING THE RATE OF CHANGE OF VOICE FUNDAMENTAL-FREQUENCY IN FLUENT SPEECH DURING MENTAL DEPRESSION
NILSONNE, A
SUNDBERG, J
TERNSTROM, S
ASKENFELT, A
[J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1988, 83 (02): : 716 - 728
[40] Multi-task WaveNet: A Multi-task Generative Model for Statistical Parametric Speech Synthesis without Fundamental Frequency Conditions
Gu, Yu
Kang, Yongguo
[J]. 19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 2007 - 2011

← 1 2 3 4 5 →