Modeling Vietnamese Speech Prosody: A Step-by-Step Approach Towards an Expressive Speech Synthesis System

被引：0

作者：

Mac, Dang-Khoa ^{[1
]}

Tran, Do-Dat ^{[1
]}

机构：

[1] Int Res Inst MICA, HUST CNRS UMI Grenoble INP 2954, Hanoi, Vietnam

来源：

TRENDS AND APPLICATIONS IN KNOWLEDGE DISCOVERY AND DATA MINING, PAKDD 2015 | 2015年 / 9441卷

关键词：

Text-to-speech; Vietnamese; Prosody modeling; Tones; Phrasing; Attitude; Expressive speech;

D O I：

10.1007/978-3-319-25660-3_23

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Attempts to add expressivity to synthesized speech is one of the main strategies in speech technologies. This paper summarizes our researches on modeling Vietnamese prosody, with the goal of improving naturalness of synthesized speech in Vietnamese, as well as integrating expressivities (i.e. emotion/attitude). Based on the concept of "rendez-vous" between linguistic levels and prosodic functions, the prosody of utterance is proposed to be decomposed into several components. Therefore, each component is step by step modeled by an independent model: a dynamic linear segment model for tones, a relative registers model for F0 level of syllable, a rule-based approach for phrasing modeling and a F0 stylization modeling for the expressive function. All proposed models were integrated in speech Text-to-speech systems and also were evaluated by perception experiments.

引用

页码：273 / 287

页数：15

共 50 条

[21] Web Ontology Building System for Novice Users: A Step-by-Step Approach
Yasunaga, Shotaro
Nakatsuka, Mitsunori
Kuwabara, Kazuhiro
[J]. INTELLIGENT INFORMATION AND DATABASE SYSTEMS, PT II, PROCEEDINGS, 2010, 5991 : 134 - +
[22] A statistical approach for modeling prosody features using POS tags for emotional speech synthesis
Bulut, Murtaza
Lee, Sungbok
Narayanan, Shrikanth
[J]. 2007 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL IV, PTS 1-3, 2007, : 1237 - +
[23] Slashing compressed air system costs - Taking a step-by-step approach
Watson, L
Scutella, K
[J]. CHEMICAL PROCESSING, 2002, 65 (06): : 46 - 48
[24] SYNTHEX SYSTEM: HANDLING PROSODY IN SPEECH SYNTHESIS.
Aggoun, Abderrahmane
[J]. Technology and science of informatics, 1987, 6 (06): : 435 - 448
[25] A step-by-step approach for specifying & designing an engine overhaul cleaning system
Heilman, G
[J]. 2001 AEROSPACE/AIRLINE PLATING & METAL FINISHING FORUM & EXPOSITION, 2001, : 97 - 108
[26] A New Approach of Speaking Rate Modeling for Mandarin Speech Prosody
Hsieh, Chiao-Hua
Chiang, Chen-Yu
Wang, Yih-Ru
Yu, Hsiu-Min
Chen, Sin-Horng
[J]. 13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 654 - 657
[27] DiffProsody: Diffusion-Based Latent Prosody Generation for Expressive Speech Synthesis With Prosody Conditional Adversarial Training
Oh, Hyung-Seok
Lee, Sang-Hoon
Lee, Seong-Whan
[J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 2654 - 2666
[28] HIERARCHICAL PROSODY MODELING FOR NON-AUTOREGRESSIVE SPEECH SYNTHESIS
Chien, Chung-Ming
Lee, Hung-yi
[J]. 2021 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP (SLT), 2021, : 446 - 453
[29] MEASURING THE EFFECT OF LINGUISTIC RESOURCES ON PROSODY MODELING FOR SPEECH SYNTHESIS
Rosenberg, Andrew
Fernandez, Raul
Ramabhadran, Bhuvana
[J]. 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5114 - 5118
[30] A framework towards expressive speech analysis and synthesis with preliminary results
Raptis, Spyros
Karabetsos, Sotiris
Chalamandaris, Aimilios
Tsiakoulis, Pirros
[J]. JOURNAL ON MULTIMODAL USER INTERFACES, 2015, 9 (04) : 387 - 394

← 1 2 3 4 5 →