FINE-GRAINED EMOTION STRENGTH TRANSFER, CONTROL AND PREDICTION FOR EMOTIONAL SPEECH SYNTHESIS

被引：32

作者：

Lei, Yi ^{[1
]}

Yang, Shan ^{[1
]}

Xie, Lei ^{[1
]}

机构：

[1] Northwestern Polytech Univ, Sch Comp Sci, Audio Speech & Language Proc Grp ASLP NPU, Xian, Peoples R China

来源：

2021 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP (SLT) | 2021年

关键词：

text-to-speech; expressive speech synthesis; emotion strength; sequence-to-sequence;

D O I：

10.1109/SLT48900.2021.9383524

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This paper proposes a unified model to conduct emotion transfer, control and prediction for sequence-to-sequence based fine-grained emotional speech synthesis. Conventional emotional speech synthesis often needs manual labels or reference audio to determine the emotional expressions of synthesized speech. Such coarse labels cannot control the details of speech emotion, often resulting in an averaged emotion expression delivery, and it is also hard to choose suitable reference audio during inference. To conduct fine-grained emotion expression generation, we introduce phoneme-level emotion strength representations through a learned ranking function to describe the local emotion details, and the sentence-level emotion category is adopted to render the global emotions of synthesized speech. With the global render and local descriptors of emotions, we can obtain fine-grained emotion expressions from reference audio via its emotion descriptors (for transfer) or directly from phoneme-level manual labels (for control). As for the emotional speech synthesis with arbitrary text inputs, the proposed model can also predict phoneme-level emotion expressions from texts, which does not require any reference audio or manual label.

引用

页码：423 / 430

页数：8

共 50 条

[1] Improving Fine-Grained Emotion Control and Transfer with Gated Emotion Representations in Speech Synthesis
Ye, Jianhao
He, Tianwei
Zhou, Hongbin
Ren, Kaimeng
He, Wendi
Lu, Heng
MAN-MACHINE SPEECH COMMUNICATION, NCMMSC 2022, 2023, 1765 : 196 - 207
[2] Fine-grained Noise Control for Multispeaker Speech Synthesis
Nikitaras, Karolos
Vamvoukakis, Georgios
Ellinas, Nikolaos
Klapsas, Konstantinos
Markopoulos, Konstantinos
Raptis, Spyros
Sung, June Sig
Jho, Gunu
Chalamandaris, Aimilios
Tsiakoulis, Pirros
INTERSPEECH 2022, 2022, : 828 - 832
[3] Fine-Grained Emotion Prediction by Modeling Emotion Definitions
Singh, Gargi
Brahma, Dhanajit
Rai, Piyush
Modi, Ashutosh
2021 9TH INTERNATIONAL CONFERENCE ON AFFECTIVE COMPUTING AND INTELLIGENT INTERACTION (ACII), 2021,
[4] EMOTION NEURAL TRANSDUCER FOR FINE-GRAINED SPEECH EMOTION RECOGNITION
Shen, Siyuan
Gao, Yu
Liu, Feng
Wang, Hanyang
Zhou, Aimin
2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2024), 2024, : 10111 - 10115
[5] MULTI-SPEAKER EMOTIONAL SPEECH SYNTHESIS WITH FINE-GRAINED PROSODY MODELING
Lu, Chunhui
Wen, Xue
Liu, Ruolan
Chen, Xiao
2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 5729 - 5733
[6] Text-Based Fine-Grained Emotion Prediction
Singh, Gargi
Brahma, Dhanajit
Rai, Piyush
Modi, Ashutosh
IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2024, 15 (02) : 405 - 416
[7] MsEmoTTS: Multi-Scale Emotion Transfer, Prediction, and Control for Emotional Speech Synthesis
Lei, Yi
Yang, Shan
Wang, Xinsheng
Xie, Lei
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2022, 30 : 853 - 864
[8] EMOTION-CONTROLLABLE SPEECH SYNTHESIS USING EMOTION SOFT LABELS AND FINE-GRAINED PROSODY FACTORS
Luo, Xuan
Takamichi, Shinnosuke
Koriyama, Tomoki
Saito, Yuki
Saruwatari, Hiroshi
2021 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2021, : 794 - 799
[9] EMOQ-TTS: EMOTION INTENSITY QUANTIZATION FOR FINE-GRAINED CONTROLLABLE EMOTIONAL TEXT-TO-SPEECH
Im, Chae-Bin
Lee, Sang-Hoon
Kim, Seung-Bin
Lee, Seong-Whan
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 6317 - 6321
[10] PiCo-VITS: Leveraging Pitch Contours for Fine-Grained Emotional Speech Synthesis
Wong, Kwan-yeung
Chung, Fu-lai
TEXT, SPEECH, AND DIALOGUE, TSD 2024, PT II, 2024, 15049 : 210 - 221

← 1 2 3 4 5 →