Analysing fundamental frequency contours and local speech rate in map task dialogs

被引:17
|
作者
Mixdorff, H
Pfitzinger, HR
机构
[1] TFH Berlin Univ Appl Sci, Dept Comp Sci & Media, D-13353 Berlin, Germany
[2] Univ Munich, Dept Phonet & Speech Commun, D-80799 Munich, Germany
关键词
Fujisaki model; perceptual local speech rate; F0; contours; map task;
D O I
10.1016/j.specom.2005.02.019
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
The current paper reports first results from the analysis of task-oriented dialogs using a Fujisaki model-based parameterization of F0 contours, as well as a model of the perceptual local speech rate. Two versions of map task style dialogs were examined: (1) the recordings made during the map task proper, (2) readings from scripts of the original dialogs by the same subjects. The first part of this paper presents an analysis of phrase boundaries with respect to form and function. A second issue is the problem of processing fillers, hesitations and repairs within the framework of the Fujisaki model-based analysis. The second part of the paper describes the comparative analysis of spontaneous and read versions of the same dialog fragments with respect to Fujisaki model parameters, contours of the perceptual local speech rate, and other features. In a perception test we asked listeners to identify the speaking style of dialog fragments. Apparently this was possible only for part of the data. Analysis of accent commands and perceptual local speech rate contours still suggested differences between the two speaking styles. The number of accented syllables, the associated accent commands' amplitudes, and the perceptual local speech rate were generally higher in the read than in the spontaneous utterances. These results were almost significant despite the fact that the read version had been well re-enacted by the subjects and therefore did not exactly exhibit typical reading style characteristics. Despite this drawback, the methodology presented here has strong potential for further comparative prosodic studies of speaking styles. (c) 2005 Elsevier B.V. All rights reserved.
引用
收藏
页码:310 / 325
页数:16
相关论文
共 45 条
  • [31] Harmonic-Net: Fundamental Frequency and Speech Rate Controllable Fast Neural Vocoder
    Matsubara, Keisuke
    Okamoto, Takuma
    Takashima, Ryoichi
    Takiguchi, Tetsuya
    Toda, Tomoki
    Kawai, Hisashi
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2023, 31 : 1902 - 1915
  • [32] 54 VOICES FROM 2 - EFFECTS OF SIMULTANEOUS MANIPULATIONS OF RATE, MEAN FUNDAMENTAL FREQUENCY, AND VARIANCE OF FUNDAMENTAL FREQUENCY ON RATINGS OF PERSONALITY FROM SPEECH
    BROWN, BL
    STRONG, WJ
    RENCHER, AC
    [J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1974, 55 (02): : 313 - 318
  • [33] Fluency, fundamental frequency, and speech rate under frequency-shifted auditory feedback in stuttering and nonstuttering persons
    Natke, U
    Kalveram, JGT
    [J]. JOURNAL OF FLUENCY DISORDERS, 2001, 26 (03) : 227 - 241
  • [34] Speech Rate Comparison when Talking to a System and Talking to a Human: A study from a Speech-to-Speech, Machine Translation mediated Map Task
    Akira, Hayakawa
    Vogel, Carl
    Luz, Saturnino
    Campbell, Nick
    [J]. 18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 3286 - 3290
  • [35] Prosodic word boundary detection using statistical modeling of moraic fundamental frequency contours and its use for continuous speech recognition
    Iwano, Koji
    Hirose, Keikichi
    [J]. ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, 1999, 1 : 133 - 136
  • [36] Detection of prosodic word boundaries by statistical modeling of mora transitions of fundamental frequency contours and its use for continuous speech recognition
    Hirose, K
    Iwano, K
    [J]. 2000 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, PROCEEDINGS, VOLS I-VI, 2000, : 1763 - 1766
  • [37] Applying generation process model constraint to fundamental frequency contours generated by hidden-Markov-model-based speech synthesis
    Matsuda, Tetsuya
    Hirose, Keikichi
    Minematsu, Nobuaki
    [J]. ACOUSTICAL SCIENCE AND TECHNOLOGY, 2012, 33 (04) : 221 - 228
  • [38] Prosodic word boundary detection using statistical modeling of moraic fundamental frequency contours and its use for continuous speech recognition
    Iwano, K
    Hirose, K
    [J]. ICASSP '99: 1999 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, PROCEEDINGS VOLS I-VI, 1999, : 133 - 136
  • [39] MEASURING THE RATE OF CHANGE OF VOICE FUNDAMENTAL-FREQUENCY IN FLUENT SPEECH DURING MENTAL DEPRESSION
    NILSONNE, A
    SUNDBERG, J
    TERNSTROM, S
    ASKENFELT, A
    [J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1988, 83 (02): : 716 - 728
  • [40] Multi-task WaveNet: A Multi-task Generative Model for Statistical Parametric Speech Synthesis without Fundamental Frequency Conditions
    Gu, Yu
    Kang, Yongguo
    [J]. 19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 2007 - 2011