Modeling Vietnamese Speech Prosody: A Step-by-Step Approach Towards an Expressive Speech Synthesis System

被引:0
|
作者
Mac, Dang-Khoa [1 ]
Tran, Do-Dat [1 ]
机构
[1] Int Res Inst MICA, HUST CNRS UMI Grenoble INP 2954, Hanoi, Vietnam
关键词
Text-to-speech; Vietnamese; Prosody modeling; Tones; Phrasing; Attitude; Expressive speech;
D O I
10.1007/978-3-319-25660-3_23
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Attempts to add expressivity to synthesized speech is one of the main strategies in speech technologies. This paper summarizes our researches on modeling Vietnamese prosody, with the goal of improving naturalness of synthesized speech in Vietnamese, as well as integrating expressivities (i.e. emotion/attitude). Based on the concept of "rendez-vous" between linguistic levels and prosodic functions, the prosody of utterance is proposed to be decomposed into several components. Therefore, each component is step by step modeled by an independent model: a dynamic linear segment model for tones, a relative registers model for F0 level of syllable, a rule-based approach for phrasing modeling and a F0 stylization modeling for the expressive function. All proposed models were integrated in speech Text-to-speech systems and also were evaluated by perception experiments.
引用
收藏
页码:273 / 287
页数:15
相关论文
共 50 条
  • [1] Towards Expressive Speech Synthesis: Analysis and Modeling of Expressive Speech
    Raptis, Spyros
    Karabetsos, Sotiris
    Chalamandaris, Aimilios
    Tsiakoulis, Pirros
    [J]. 2014 5th IEEE Conference on Cognitive Infocommunications (CogInfoCom), 2014, : 461 - 465
  • [2] Prosody modelling of Spanish for expressive speech synthesis
    Iriondo, Ignasi
    Socoro, Joan Claudi
    Alias, Francesc
    [J]. 2007 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL IV, PTS 1-3, 2007, : 821 - +
  • [3] Towards End-to-End Prosody Transfer for Expressive Speech Synthesis with Tacotron
    Skerry-Ryan, R. J.
    Battenberg, Eric
    Xiao, Ying
    Wang, Yuxuan
    Stanton, Daisy
    Shor, Joel
    Weiss, Ron J.
    Clark, Rob
    Saurous, Rif A.
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 80, 2018, 80
  • [4] Comparison of chironomic stylization versus statistical modeling of prosody for expressive speech synthesis
    Evrard, Marc
    Delalez, Samuel
    d'Alessandro, Christophe
    Rilliard, Albert
    [J]. 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 3370 - 3374
  • [5] Intonation and Prosody Conversion for Expressive Mandarin Speech Synthesis
    Zhu, Jing
    Yu, Yibiao
    [J]. PROCEEDINGS OF 2012 IEEE 11TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING (ICSP) VOLS 1-3, 2012, : 549 - 552
  • [6] Expressive Prosody for Unit-selection Speech Synthesis
    Strom, Volker
    Clark, Robert
    King, Simon
    [J]. INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 1296 - 1299
  • [7] Speech Modification for Prosody Conversion in Expressive Marathi Text-to-Speech Synthesis
    Anil, Manjare Chandraprabha
    Shirbahadurkar, S. D.
    [J]. 2014 INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING AND INTEGRATED NETWORKS (SPIN), 2014, : 56 - 58
  • [8] Melasma: A Step-by-Step Approach Towards a Multimodal Combination Therapy
    Philipp-Dormston, Wolfgang G.
    [J]. CLINICAL COSMETIC AND INVESTIGATIONAL DERMATOLOGY, 2024, 17 : 1203 - 1216
  • [9] Prosody analysis and modeling for emotional speech synthesis
    Jiang, DN
    Zhang, W
    Shen, LQ
    Cai, LH
    [J]. 2005 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1-5: SPEECH PROCESSING, 2005, : 281 - 284
  • [10] DEVELOPING SYSTEM DYNAMICS MODELS WITH "STEP-BY-STEP" APPROACH
    Pepic-Bach, Mirjana
    Ceric, Vlatko
    [J]. JOURNAL OF INFORMATION AND ORGANIZATIONAL SCIENCES, 2007, 31 (01) : 171 - 185