Manipulation of the prosodic features of vocal tract length, nasality and articulatory precision using articulatory synthesis

被引:16
|
作者
Birkholz, Peter [1 ]
Martin, Lucia [2 ,3 ]
Xu, Yi [4 ]
Scherbaum, Stefan [5 ]
Neuschaefer-Rube, Christiane [2 ,3 ]
机构
[1] Tech Univ Dresden, Inst Acoust & Speech Commun, D-01062 Dresden, Germany
[2] Univ Hosp Aachen, Dept Phoniatr Pedaudiol & Commun Disorders, Pauwelsstr 30, D-52074 Aachen, Germany
[3] Rhein Westfal TH Aachen, Pauwelsstr 30, D-52074 Aachen, Germany
[4] UCL, Dept Speech Hearing & Phonet Sci, Chandler House,2 Wakefield St, London, England
[5] Tech Univ Dresden, Dept Psychol, D-01062 Dresden, Germany
来源
关键词
Prosody; Feature manipulation; Articulatory synthesis; SPEECH SYNTHESIS SYSTEM; VOWEL REDUCTION; VOICE QUALITY; EMOTIONS; EXPRESSION; PERSONALITY; SPEAKING; STRESS;
D O I
10.1016/j.csl.2016.06.004
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Vocal emotions, as well as different speaking styles and speaker traits, are characterized by a complex interplay of multiple prosodic features. Natural sounding speech synthesis with the ability to control such paralinguistic aspects requires the manipulation of the corresponding prosodic features. With traditional concatenative speech synthesis it is easy to manipulate the "primary" prosodic features pitch, duration, and intensity, but it is very hard to individually control "secondary" prosodic features like phonation type, vocal tract length, articulatory precision and nasality. These secondary features can be controlled more directly with parametric synthesis methods. In the present study we analyze the ability of articulatory speech synthesis to control secondary prosodic features by rule. To this end, nine German words were re-synthesized with the software VocalTractLab 2.1 and then manipulated in different ways at the articulatory level to vary vocal tract length, articulatory precision and degree of nasality. Listening tests showed that most of the intended prosodic manipulations could be reliably identified with recognition rates between 77% and 96%. Only the manipulations to increase articulatory precision were hardly recognized. The results suggest that rule-based manipulations in articulatory synthesis are generally sufficient for the convincing synthesis of secondary prosodic features at the word level. (C) 2016 Elsevier Ltd. All rights reserved.
引用
收藏
页码:116 / 127
页数:12
相关论文
共 50 条
  • [1] ARTICULATORY VOCAL TRACT SYNTHESIS IN SUPERCOLLIDER
    Murphy, Damian T.
    Jani, Matyas
    Ternstrom, Sten
    [J]. DAFX-15: PROCEEDINGS OF THE 18TH INTERNATIONAL CONFERENCE ON DIGITAL AUDIO EFFECTS, 2015, : 307 - 313
  • [2] Acoustic-to-articulatory mapping codebook constraint for determining vocal-tract length for inverse speech problem and articulatory synthesis
    Yu, ZL
    Zeng, SC
    [J]. 2000 5TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING PROCEEDINGS, VOLS I-III, 2000, : 827 - 830
  • [3] ARTICULATORY MODELING OF THE VOCAL-TRACT
    MAEDA, S
    [J]. JOURNAL DE PHYSIQUE IV, 1992, 2 (C1): : 307 - 314
  • [4] Articulatory copy synthesis using a nine-parameter vocal tract model
    Goodyear, CC
    Wei, DB
    [J]. 1996 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, CONFERENCE PROCEEDINGS, VOLS 1-6, 1996, : 385 - 388
  • [5] CONSTRUCTION AND EVALUATION OF AN ARTICULATORY MODEL OF THE VOCAL TRACT
    Laprie, Yves
    Busset, Julie
    [J]. 19TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO-2011), 2011, : 466 - 470
  • [6] ARTICULATORY ANALOG OF VOCAL TRACT AND NASAL CAVITIES
    HECKER, MHL
    [J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1961, 33 (11): : 1665 - +
  • [7] Vocal tract normalization for midsagittal articulatory recovery with analysis-by-synthesis
    McGowan, Richard S.
    Cushing, Steven
    [J]. Journal of the Acoustical Society of America, 106 (02):
  • [8] Vocal tract normalization for midsagittal articulatory recovery with analysis-by-synthesis
    McGowan, RS
    Cushing, S
    [J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1999, 106 (02): : 1090 - 1105
  • [9] Mapping Articulatory-Features to Vocal-Tract Parameters for Voice Conversion
    Ariwardhani, Narpendyah Wisjnu
    Kimura, Masashi
    Iribe, Yurie
    Katsurada, Kouichi
    Nitta, Tsuneo
    [J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2014, E97D (04): : 911 - 918
  • [10] Vocal tract length normalization for speaker independent acoustic-to-articulatory speech inversion
    Sivaraman, Ganesh
    Mitra, Vikramjit
    Nam, Hosung
    Tiede, Mark
    Espy-Wilson, Carol
    [J]. 17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 455 - 459