Modeling Vietnamese Speech Prosody: A Step-by-Step Approach Towards an Expressive Speech Synthesis System

被引：0

作者：

Mac, Dang-Khoa ^{[1
]}

Tran, Do-Dat ^{[1
]}

机构：

[1] Int Res Inst MICA, HUST CNRS UMI Grenoble INP 2954, Hanoi, Vietnam

来源：

TRENDS AND APPLICATIONS IN KNOWLEDGE DISCOVERY AND DATA MINING, PAKDD 2015 | 2015年 / 9441卷

关键词：

Text-to-speech; Vietnamese; Prosody modeling; Tones; Phrasing; Attitude; Expressive speech;

D O I：

10.1007/978-3-319-25660-3_23

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Attempts to add expressivity to synthesized speech is one of the main strategies in speech technologies. This paper summarizes our researches on modeling Vietnamese prosody, with the goal of improving naturalness of synthesized speech in Vietnamese, as well as integrating expressivities (i.e. emotion/attitude). Based on the concept of "rendez-vous" between linguistic levels and prosodic functions, the prosody of utterance is proposed to be decomposed into several components. Therefore, each component is step by step modeled by an independent model: a dynamic linear segment model for tones, a relative registers model for F0 level of syllable, a rule-based approach for phrasing modeling and a F0 stylization modeling for the expressive function. All proposed models were integrated in speech Text-to-speech systems and also were evaluated by perception experiments.

引用

页码：273 / 287

页数：15

共 50 条

[31] Nano Focus: Step-by-step synthesis approach leads to complex hybrid nanoparticles
Alia P. Schoen
[J]. MRS Bulletin, 2012, 37 : 6 - 7
[32] APPROACH TOWARDS A SYNTHESIS-BASED SPEECH RECOGNITION SYSTEM
THOSAR, RB
RAO, PVS
[J]. IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1976, 24 (02): : 194 - 196
[33] Expressive Speech Synthesis via Modeling Expressions with Variational Autoencoder
Akuzawa, Kei
Iwasawa, Yusuke
Matsuo, Yutaka
[J]. 19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 3067 - 3071
[34] An HMM-based Vietnamese Speech Synthesis System
Vu, Thang Tat
Luong, Mai Chi
Nakamura, Satoshi
[J]. ORIENTAL COCOSDA 2009 - INTERNATIONAL CONFERENCE ON SPEECH DATABASE AND ASSESSMENTS, 2009, : 116 - +
[35] Alternative Vietnamese Speech Synthesis System with Phoneme Structure
Quang Tuong Lam
Duc Hao Do
Thanh Hung Vo
Duc Dung Nguyen
[J]. ISCIT 2019: PROCEEDINGS OF 2019 19TH INTERNATIONAL SYMPOSIUM ON COMMUNICATIONS AND INFORMATION TECHNOLOGIES (ISCIT), 2019, : 64 - 69
[36] Evaluation of Prosody in Text-to-Speech Synthesis System of Bangla
Basu, Tulika
Saha, Arup
[J]. 2013 INTERNATIONAL CONFERENCE ORIENTAL COCOSDA HELD JOINTLY WITH 2013 CONFERENCE ON ASIAN SPOKEN LANGUAGE RESEARCH AND EVALUATION (O-COCOSDA/CASLRE), 2013,
[37] Towards Multi-Scale Style Control for Expressive Speech Synthesis
Li, Xiang
Song, Changhe
Li, Jingbei
Wu, Zhiyong
Jia, Jia
Meng, Helen
[J]. INTERSPEECH 2021, 2021, : 4673 - 4677
[38] ACCENT GROUP MODELING FOR IMPROVED PROSODY IN STATISTICAL PARAMETERIC SPEECH SYNTHESIS
Anumanchipalli, Gopala Krishna
Oliveira, Luis C.
Black, Alan W.
[J]. 2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 6890 - 6894
[39] Prosody-TTS: An End-to-End Speech Synthesis System with Prosody Control
Pamisetty, Giridhar
Murty, K. Sri Rama
[J]. CIRCUITS SYSTEMS AND SIGNAL PROCESSING, 2023, 42 (01) : 361 - 384
[40] Prosody-TTS: An End-to-End Speech Synthesis System with Prosody Control
Giridhar Pamisetty
K. Sri Rama Murty
[J]. Circuits, Systems, and Signal Processing, 2023, 42 : 361 - 384

← 1 2 3 4 5 →