Modeling Vietnamese Speech Prosody: A Step-by-Step Approach Towards an Expressive Speech Synthesis System

被引：0

作者：

Mac, Dang-Khoa ^{[1
]}

Tran, Do-Dat ^{[1
]}

机构：

[1] Int Res Inst MICA, HUST CNRS UMI Grenoble INP 2954, Hanoi, Vietnam

来源：

TRENDS AND APPLICATIONS IN KNOWLEDGE DISCOVERY AND DATA MINING, PAKDD 2015 | 2015年 / 9441卷

关键词：

Text-to-speech; Vietnamese; Prosody modeling; Tones; Phrasing; Attitude; Expressive speech;

D O I：

10.1007/978-3-319-25660-3_23

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Attempts to add expressivity to synthesized speech is one of the main strategies in speech technologies. This paper summarizes our researches on modeling Vietnamese prosody, with the goal of improving naturalness of synthesized speech in Vietnamese, as well as integrating expressivities (i.e. emotion/attitude). Based on the concept of "rendez-vous" between linguistic levels and prosodic functions, the prosody of utterance is proposed to be decomposed into several components. Therefore, each component is step by step modeled by an independent model: a dynamic linear segment model for tones, a relative registers model for F0 level of syllable, a rule-based approach for phrasing modeling and a F0 stylization modeling for the expressive function. All proposed models were integrated in speech Text-to-speech systems and also were evaluated by perception experiments.

引用

页码：273 / 287

页数：15

共 50 条

[1] Towards Expressive Speech Synthesis: Analysis and Modeling of Expressive Speech
Raptis, Spyros
Karabetsos, Sotiris
Chalamandaris, Aimilios
Tsiakoulis, Pirros
[J]. 2014 5th IEEE Conference on Cognitive Infocommunications (CogInfoCom), 2014, : 461 - 465
[2] Prosody modelling of Spanish for expressive speech synthesis
Iriondo, Ignasi
Socoro, Joan Claudi
Alias, Francesc
[J]. 2007 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL IV, PTS 1-3, 2007, : 821 - +
[3] Towards End-to-End Prosody Transfer for Expressive Speech Synthesis with Tacotron
Skerry-Ryan, R. J.
Battenberg, Eric
Xiao, Ying
Wang, Yuxuan
Stanton, Daisy
Shor, Joel
Weiss, Ron J.
Clark, Rob
Saurous, Rif A.
[J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 80, 2018, 80
[4] Comparison of chironomic stylization versus statistical modeling of prosody for expressive speech synthesis
Evrard, Marc
Delalez, Samuel
d'Alessandro, Christophe
Rilliard, Albert
[J]. 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 3370 - 3374
[5] Intonation and Prosody Conversion for Expressive Mandarin Speech Synthesis
Zhu, Jing
Yu, Yibiao
[J]. PROCEEDINGS OF 2012 IEEE 11TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING (ICSP) VOLS 1-3, 2012, : 549 - 552
[6] Expressive Prosody for Unit-selection Speech Synthesis
Strom, Volker
Clark, Robert
King, Simon
[J]. INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 1296 - 1299
[7] Speech Modification for Prosody Conversion in Expressive Marathi Text-to-Speech Synthesis
Anil, Manjare Chandraprabha
Shirbahadurkar, S. D.
[J]. 2014 INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING AND INTEGRATED NETWORKS (SPIN), 2014, : 56 - 58
[8] Melasma: A Step-by-Step Approach Towards a Multimodal Combination Therapy
Philipp-Dormston, Wolfgang G.
[J]. CLINICAL COSMETIC AND INVESTIGATIONAL DERMATOLOGY, 2024, 17 : 1203 - 1216
[9] Prosody analysis and modeling for emotional speech synthesis
Jiang, DN
Zhang, W
Shen, LQ
Cai, LH
[J]. 2005 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1-5: SPEECH PROCESSING, 2005, : 281 - 284
[10] DEVELOPING SYSTEM DYNAMICS MODELS WITH "STEP-BY-STEP" APPROACH
Pepic-Bach, Mirjana
Ceric, Vlatko
[J]. JOURNAL OF INFORMATION AND ORGANIZATIONAL SCIENCES, 2007, 31 (01) : 171 - 185

← 1 2 3 4 5 →