Modeling Vietnamese Speech Prosody: A Step-by-Step Approach Towards an Expressive Speech Synthesis System

被引:0
|
作者
Mac, Dang-Khoa [1 ]
Tran, Do-Dat [1 ]
机构
[1] Int Res Inst MICA, HUST CNRS UMI Grenoble INP 2954, Hanoi, Vietnam
关键词
Text-to-speech; Vietnamese; Prosody modeling; Tones; Phrasing; Attitude; Expressive speech;
D O I
10.1007/978-3-319-25660-3_23
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Attempts to add expressivity to synthesized speech is one of the main strategies in speech technologies. This paper summarizes our researches on modeling Vietnamese prosody, with the goal of improving naturalness of synthesized speech in Vietnamese, as well as integrating expressivities (i.e. emotion/attitude). Based on the concept of "rendez-vous" between linguistic levels and prosodic functions, the prosody of utterance is proposed to be decomposed into several components. Therefore, each component is step by step modeled by an independent model: a dynamic linear segment model for tones, a relative registers model for F0 level of syllable, a rule-based approach for phrasing modeling and a F0 stylization modeling for the expressive function. All proposed models were integrated in speech Text-to-speech systems and also were evaluated by perception experiments.
引用
收藏
页码:273 / 287
页数:15
相关论文
共 50 条
  • [21] Web Ontology Building System for Novice Users: A Step-by-Step Approach
    Yasunaga, Shotaro
    Nakatsuka, Mitsunori
    Kuwabara, Kazuhiro
    [J]. INTELLIGENT INFORMATION AND DATABASE SYSTEMS, PT II, PROCEEDINGS, 2010, 5991 : 134 - +
  • [22] A statistical approach for modeling prosody features using POS tags for emotional speech synthesis
    Bulut, Murtaza
    Lee, Sungbok
    Narayanan, Shrikanth
    [J]. 2007 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL IV, PTS 1-3, 2007, : 1237 - +
  • [23] Slashing compressed air system costs - Taking a step-by-step approach
    Watson, L
    Scutella, K
    [J]. CHEMICAL PROCESSING, 2002, 65 (06): : 46 - 48
  • [24] SYNTHEX SYSTEM: HANDLING PROSODY IN SPEECH SYNTHESIS.
    Aggoun, Abderrahmane
    [J]. Technology and science of informatics, 1987, 6 (06): : 435 - 448
  • [25] A step-by-step approach for specifying & designing an engine overhaul cleaning system
    Heilman, G
    [J]. 2001 AEROSPACE/AIRLINE PLATING & METAL FINISHING FORUM & EXPOSITION, 2001, : 97 - 108
  • [26] A New Approach of Speaking Rate Modeling for Mandarin Speech Prosody
    Hsieh, Chiao-Hua
    Chiang, Chen-Yu
    Wang, Yih-Ru
    Yu, Hsiu-Min
    Chen, Sin-Horng
    [J]. 13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 654 - 657
  • [27] DiffProsody: Diffusion-Based Latent Prosody Generation for Expressive Speech Synthesis With Prosody Conditional Adversarial Training
    Oh, Hyung-Seok
    Lee, Sang-Hoon
    Lee, Seong-Whan
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 2654 - 2666
  • [28] HIERARCHICAL PROSODY MODELING FOR NON-AUTOREGRESSIVE SPEECH SYNTHESIS
    Chien, Chung-Ming
    Lee, Hung-yi
    [J]. 2021 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP (SLT), 2021, : 446 - 453
  • [29] MEASURING THE EFFECT OF LINGUISTIC RESOURCES ON PROSODY MODELING FOR SPEECH SYNTHESIS
    Rosenberg, Andrew
    Fernandez, Raul
    Ramabhadran, Bhuvana
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5114 - 5118
  • [30] A framework towards expressive speech analysis and synthesis with preliminary results
    Raptis, Spyros
    Karabetsos, Sotiris
    Chalamandaris, Aimilios
    Tsiakoulis, Pirros
    [J]. JOURNAL ON MULTIMODAL USER INTERFACES, 2015, 9 (04) : 387 - 394