An emotional speech synthesis markup language processor for multi-speaker and emotional text-to-speech applications

被引：0

作者：

Ryu, Se-Hui ^{[1
]}

Cho, Hee ^{[1
]}

Lee, Ju-Hyun ^{[1
]}

Hong, Ki-Hyung ^{[1
]}

机构：

[1] Sungshin Womens Univ, Dept Serv Design Engn, 34 Da Gil 2, Bomun Ro 02844, South Korea

来源：

JOURNAL OF THE ACOUSTICAL SOCIETY OF KOREA | 2021年 / 40卷 / 05期

关键词：

Text-to-speech; Markup language; Emotion; Multiple voice colors;

D O I：

10.7776/ASK.2021.40.5.523

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

In this paper, we designed and developed an Emotional Speech Synthesis Markup Language (SSML) processor. Multi-speaker emotional speech synthesis technology that can express multiple voice colors and emotional expressions have been developed, and we designed Emotional SSML by extending SSML for multiple voice colors and emotional expressions. The Emotional SSML processor has a graphic user interface and consists of following four components. First, a multi-speaker emotional text editor that can easily mark specific voice colors and emotions on desired positions. Second, an Emotional SSML document generator that creates an Emotional SSML document automatically from the result of the multi-speaker emotional text editor. Third, an Emotional SSML parser that parses the Emotional SSML document. Last, a sequencer to control a multi-speaker and emotional Text-to-Speech (TTS) engine based on the result of the Emotional SSML parser. Based on SSML which is a programming language and platform independent open standard, the Emotional SSML processor can easily integrate with various speech synthesis engines and facilitates the development of multi-speaker emotional text-to-speech applications.

引用

页码：523 / 529

页数：7

共 50 条

[1] Multi-speaker Emotional Text-to-speech Synthesizer
Cho, Sungjae
Lee, Soo-Young
[J]. INTERSPEECH 2021, 2021, : 2337 - 2338
[2] ECAPA-TDNN for Multi-speaker Text-to-speech Synthesis
Xue, Jinlong
Deng, Yayue
Han, Yichen
Li, Ya
Sun, Jianqing
Liang, Jiaen
[J]. 2022 13TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2022, : 230 - 234
[3] Emotional Speech Synthesis for Multi-Speaker Emotional Dataset Using WaveNet Vocoder
Choi, Heejin
Park, Sangjun
Park, Jinuk
Hahn, Minsoo
[J]. 2019 IEEE INTERNATIONAL CONFERENCE ON CONSUMER ELECTRONICS (ICCE), 2019,
[4] Multi-speaker Text-to-speech Synthesis Using Deep Gaussian Processes
Mitsui, Kentaro
Koriyama, Tomoki
Saruwatari, Hiroshi
[J]. INTERSPEECH 2020, 2020, : 2032 - 2036
[5] Deep Voice 2: Multi-Speaker Neural Text-to-Speech
Arik, Sercan O.
Diamos, Gregory
Gibiansky, Andrew
Miller, John
Peng, Kainan
Ping, Wei
Raiman, Jonathan
Zhou, Yanqi
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 30 (NIPS 2017), 2017, 30
[6] Lightweight, Multi-Speaker, Multi-Lingual Indic Text-to-Speech
Singh, Abhayjeet
Nagireddi, Amala
Jayakumar, Anjali
Deekshitha, G.
Bandekar, Jesuraja
Roopa, R.
Badiger, Sandhya
Udupa, Sathvik
Kumar, Saurabh
Ghosh, Prasanta Kumar
Murthy, Hema A.
Zen, Heiga
Kumar, Pranaw
Kant, Kamal
Bole, Amol
Singh, Bira Chandra
Tokuda, Keiichi
Hasegawa-Johnson, Mark
Olbrich, Philipp
[J]. IEEE OPEN JOURNAL OF SIGNAL PROCESSING, 2024, 5 : 790 - 798
[7] Multi-Lingual Multi-Speaker Text-to-Speech Synthesis for Voice Cloning with Online Speaker Enrollment
Liu, Zhaoyu
Mak, Brian
[J]. INTERSPEECH 2020, 2020, : 2932 - 2936
[8] Cross-lingual, Multi-speaker Text-To-Speech Synthesis Using Neural Speaker Embedding
Chen, Mengnan
Chen, Minchuan
Liang, Shuang
Ma, Jun
Chen, Lei
Wang, Shaojun
Xiao, Jing
[J]. INTERSPEECH 2019, 2019, : 2105 - 2109
[9] Semi-supervised Learning for Multi-speaker Text-to-speech Synthesis Using Discrete Speech Representation
Tu, Tao
Chen, Yuan-Jui
Liu, Alexander H.
Lee, Hung-yi
[J]. INTERSPEECH 2020, 2020, : 3191 - 3195
[10] Modeling and synthesizing emotional speech for Catalan text-to-speech synthesis
Iriondo, I
Alías, F
Melenchón, J
Llorca, MA
[J]. AFFECTIVE DIALOGUE SYSTEMS, PROCEEDINGS, 2004, 3068 : 197 - 208

← 1 2 3 4 5 →