Prosodic reading style simulation for text-to-speech synthesis

被引：0

作者：

Jokisch, O ^{[1
]}

Kruschke, H ^{[1
]}

Hoffmann, R ^{[1
]}

机构：

[1] Tech Univ Dresden, Lab Acoust & Speech Commun, D-8027 Dresden, Germany

来源：

AFFECTIVE COMPUTING AND INTELLIGENT INTERACTION, PROCEEDINGS | 2005年 / 3784卷

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The simulation of different reading styles (mainly by adapting prosodic parameters) can improve the naturalness of synthetic speech and supports a more intelligent human machine interaction. The article exemplarily investigates the reading styles News and Tale. For comparison, all examined texts contained the same genre-neutral paragraphs which have been read without a specific style instruction: Normal but also faster, slower, rather monotone or more emotional which led to corresponding artificial styles. The measured original intonation and durations style patterns control a diphone synthesizer (mapped contours). Additionally, the patterns are used to train a neural network (NN) model. Within two separate listening tests, different stimuli presented as original signal/style, respectively, with mapped or NN generated prosodic contours have been evaluated. The results show that both, original utterances and artificial styles are basically perceived in their intended reading styles. Some reciprocal confusions indicate the similarities between different styles like News and Fast, Tale and Slow as well as Tale and Expressive. The confusions are more likely for synthetic speech. To produce c. g. the complex style Tale, different features of the prosodic variations Slow and Expressive are combined. The training method for the synthetic styles requires a further improvement.

引用

页码：426 / 432

页数：7

共 50 条

[1] A prosodic model for text-to-speech synthesis in French
Di Cristo, A
Di Cristo, P
Campione, E
Véronis, J
[J]. INTONATION: ANALYSIS, MODELLING AND TECHNOLOGY, 2000, 15 : 321 - 355
[2] Speech synthesis for text-to-speech alignment and prosodic feature extraction
Malfrere, F
Dutoit, T
[J]. ISCAS '97 - PROCEEDINGS OF 1997 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, VOLS I - IV: CIRCUITS AND SYSTEMS IN THE INFORMATION AGE, 1997, : 2637 - 2640
[3] A superposed prosodic model for Chinese text-to-speech synthesis
Chen, GP
Bailly, G
Liu, QF
Wang, RH
[J]. 2004 International Symposium on Chinese Spoken Language Processing, Proceedings, 2004, : 177 - 180
[4] A prosodic Turkish text-to-speech synthesizer
Vural, E
Oflazer, K
[J]. PROCEEDINGS OF THE IEEE 12TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE, 2004, : 458 - 460
[5] A prosodic diphone database for Korean text-to-speech synthesis system
Yoon, K
[J]. COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING, 2005, 3406 : 425 - 428
[6] A prosodic phrasing model for a Korean text-to-speech synthesis system
Yoon, K
[J]. COMPUTER SPEECH AND LANGUAGE, 2006, 20 (01): : 69 - 79
[7] A method for estimating prosodic symbol from text for Japanese text-to-speech synthesis
Magata, K
Hamagami, T
Komura, M
[J]. ICSLP 96 - FOURTH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, VOLS 1-4, 1996, : 1373 - 1376
[8] A Prosodic Text-to-Speech System for Yoruba Language
Akinwonmi, Akintoba Emmanuel
Alese, Boniface Kayode
[J]. 2013 8TH INTERNATIONAL CONFERENCE FOR INTERNET TECHNOLOGY AND SECURED TRANSACTIONS (ICITST), 2013, : 630 - 635
[9] ON GRANULARITY OF PROSODIC REPRESENTATIONS IN EXPRESSIVE TEXT-TO-SPEECH
Babianski, Mikolaj
Pokora, Kamil
Shah, Raahil
Sienkiewicz, Rafal
Korzekwa, Daniel
Klimkov, Viacheslav
[J]. 2022 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, SLT, 2022, : 892 - 899
[10] Prosodic Annotation in a Thai Text-to-speech System
Potisuk, Siripong
[J]. PACLIC 21: THE 21ST PACIFIC ASIA CONFERENCE ON LANGUAGE, INFORMATION AND COMPUTATION, PROCEEDINGS, 2007, : 405 - 414

← 1 2 3 4 5 →