Enhancing Word-Level Semantic Representation via Dependency Structure for Expressive Text-to-Speech Synthesis

被引：2

作者：

Zhou, Yixuan ^{[1
,4
]}

Song, Changhe ^{[1
]}

Li, Jingbei ^{[1
]}

Wu, Zhiyong ^{[1
,2
]}

Bian, Yanyao ^{[3
]}

Su, Dan ^{[3
]}

Meng, Helen ^{[2
]}

机构：

[1] Tsinghua Univ, Shenzhen Int Grad Sch, Shenzhen, Peoples R China

[2] Chinese Univ Hong Kong, Hong Kong, Peoples R China

[3] Tencent, Tencent AI Lab, Shenzhen, Peoples R China

[4] Tencent, Shenzhen, Peoples R China

来源：

INTERSPEECH 2022 | 2022年

基金：

中国国家自然科学基金;

关键词：

expressive speech synthesis; semantic representation enhancing; dependency parsing; graph neural network;

D O I：

10.21437/Interspeech.2022-10061

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Exploiting rich linguistic information in raw text is crucial for expressive text-to-speech (TTS). As large scale pre-trained text representation develops, bidirectional encoder representations from Transformers (BERT) has been proven to embody semantic information and employed to TTS recently. However, original or simply fine-tuned BERT embeddings still cannot provide sufficient semantic knowledge that expressive TTS models should take into account. In this paper, we propose a word-level semantic representation enhancing method based on dependency structure and pre-trained BERT embedding. The BERT embedding of each word is reprocessed considering its specific dependencies and related words in the sentence, to generate more effective semantic representation for TTS. To better utilize the dependency structure, relational gated graph network (RGGN) is introduced to make semantic information flow and aggregate through the dependency structure. The experimental results show that the proposed method can further improve the naturalness and expressiveness of synthesized speeches on both Mandarin and English datasets(1).

引用

页码：5518 / 5522

页数：5

共 47 条

[11] USING VAES AND NORMALIZING FLOWS FOR ONE-SHOT TEXT-TO-SPEECH SYNTHESIS OF EXPRESSIVE SPEECH
Aggarwal, Vatsal
Cotescu, Marius
Prateek, Nishant
Lorenzo-Trueba, Jaime
Barra-Chicote, Roberto
2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 6179 - 6183
[12] Enhancing Speech Intelligibility in Text-To-Speech Synthesis using Speaking Style Conversion
Paul, Dipjyoti
Shifas, Muhammed P., V
Pantazis, Yannis
Stylianou, Yannis
INTERSPEECH 2020, 2020, : 1361 - 1365
[13] Fluent Personalized Speech Synthesis with Prosodic Word-Level Spontaneous Speech generation
Huang, Yi-Chin
Wu, Chung-Hsien
Shie, Ming-Ge
16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 294 - 298
[14] WORD-LEVEL EMPHASIS MODELLING IN HMM-BASED SPEECH SYNTHESIS
Yu, K.
Mairesse, F.
Young, S.
2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, : 4238 - 4241
[15] INTONATIONAL PHRASE BREAK PREDICTION FOR TEXT-TO-SPEECH SYNTHESIS USING DEPENDENCY RELATIONS
Mishra, Taniya
Kim, Yeon-jun
Bangalore, Srinivas
2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 4919 - 4923
[16] Better Human Computer Interaction by Enhancing the Quality of Text-to-Speech Synthesis
Reddy, V. Ramu
Rao, K. Sreenivasa
4TH INTERNATIONAL CONFERENCE ON INTELLIGENT HUMAN COMPUTER INTERACTION (IHCI 2012), 2012,
[17] E-TTS: Expressive Text-to-Speech Synthesis for Hindi Using Data Augmentation
Gupta, Ishika
Murthy, Hema A.
SPEECH AND COMPUTER, SPECOM 2023, PT II, 2023, 14339 : 243 - 257
[18] Detecting Word-Level Adversarial Text Attacks via SHapley Additive exPlanations
Huber, Lukas
Kuehn, Marc Alexander
Mosca, Edoardo
Groh, Georg
PROCEEDINGS OF THE 7TH WORKSHOP ON REPRESENTATION LEARNING FOR NLP, 2022, : 156 - 166
[19] Modelling speech temporal structure for Estonian text-to-speech synthesis: Feature selection
Mihkla, Meelis
TRAMES-JOURNAL OF THE HUMANITIES AND SOCIAL SCIENCES, 2007, 11 (03): : 284 - 298
[20] Algorithms for Speech Segmentation at Syllable-Level for Text-to-Speech Synthesis System in Gujarati
Patil, Hemant A.
Patel, Tanvina
Talesara, Swati
Shah, Nirmesh
Sailor, Hardik
Vachhani, Bhavik
Akhani, Janki
Kanakiya, Bhargav
Gaur, Yashesh
Prajapati, Vibha
2013 INTERNATIONAL CONFERENCE ORIENTAL COCOSDA HELD JOINTLY WITH 2013 CONFERENCE ON ASIAN SPOKEN LANGUAGE RESEARCH AND EVALUATION (O-COCOSDA/CASLRE), 2013,

← 1 2 3 4 5 →