Prosodic reading style simulation for text-to-speech synthesis

被引:0
|
作者
Jokisch, O [1 ]
Kruschke, H [1 ]
Hoffmann, R [1 ]
机构
[1] Tech Univ Dresden, Lab Acoust & Speech Commun, D-8027 Dresden, Germany
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The simulation of different reading styles (mainly by adapting prosodic parameters) can improve the naturalness of synthetic speech and supports a more intelligent human machine interaction. The article exemplarily investigates the reading styles News and Tale. For comparison, all examined texts contained the same genre-neutral paragraphs which have been read without a specific style instruction: Normal but also faster, slower, rather monotone or more emotional which led to corresponding artificial styles. The measured original intonation and durations style patterns control a diphone synthesizer (mapped contours). Additionally, the patterns are used to train a neural network (NN) model. Within two separate listening tests, different stimuli presented as original signal/style, respectively, with mapped or NN generated prosodic contours have been evaluated. The results show that both, original utterances and artificial styles are basically perceived in their intended reading styles. Some reciprocal confusions indicate the similarities between different styles like News and Fast, Tale and Slow as well as Tale and Expressive. The confusions are more likely for synthetic speech. To produce c. g. the complex style Tale, different features of the prosodic variations Slow and Expressive are combined. The training method for the synthetic styles requires a further improvement.
引用
收藏
页码:426 / 432
页数:7
相关论文
共 50 条
  • [1] A prosodic model for text-to-speech synthesis in French
    Di Cristo, A
    Di Cristo, P
    Campione, E
    Véronis, J
    [J]. INTONATION: ANALYSIS, MODELLING AND TECHNOLOGY, 2000, 15 : 321 - 355
  • [2] Speech synthesis for text-to-speech alignment and prosodic feature extraction
    Malfrere, F
    Dutoit, T
    [J]. ISCAS '97 - PROCEEDINGS OF 1997 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, VOLS I - IV: CIRCUITS AND SYSTEMS IN THE INFORMATION AGE, 1997, : 2637 - 2640
  • [3] A superposed prosodic model for Chinese text-to-speech synthesis
    Chen, GP
    Bailly, G
    Liu, QF
    Wang, RH
    [J]. 2004 International Symposium on Chinese Spoken Language Processing, Proceedings, 2004, : 177 - 180
  • [4] A prosodic Turkish text-to-speech synthesizer
    Vural, E
    Oflazer, K
    [J]. PROCEEDINGS OF THE IEEE 12TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE, 2004, : 458 - 460
  • [5] A prosodic diphone database for Korean text-to-speech synthesis system
    Yoon, K
    [J]. COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING, 2005, 3406 : 425 - 428
  • [6] A prosodic phrasing model for a Korean text-to-speech synthesis system
    Yoon, K
    [J]. COMPUTER SPEECH AND LANGUAGE, 2006, 20 (01): : 69 - 79
  • [7] A method for estimating prosodic symbol from text for Japanese text-to-speech synthesis
    Magata, K
    Hamagami, T
    Komura, M
    [J]. ICSLP 96 - FOURTH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, VOLS 1-4, 1996, : 1373 - 1376
  • [8] A Prosodic Text-to-Speech System for Yoruba Language
    Akinwonmi, Akintoba Emmanuel
    Alese, Boniface Kayode
    [J]. 2013 8TH INTERNATIONAL CONFERENCE FOR INTERNET TECHNOLOGY AND SECURED TRANSACTIONS (ICITST), 2013, : 630 - 635
  • [9] ON GRANULARITY OF PROSODIC REPRESENTATIONS IN EXPRESSIVE TEXT-TO-SPEECH
    Babianski, Mikolaj
    Pokora, Kamil
    Shah, Raahil
    Sienkiewicz, Rafal
    Korzekwa, Daniel
    Klimkov, Viacheslav
    [J]. 2022 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, SLT, 2022, : 892 - 899
  • [10] Prosodic Annotation in a Thai Text-to-speech System
    Potisuk, Siripong
    [J]. PACLIC 21: THE 21ST PACIFIC ASIA CONFERENCE ON LANGUAGE, INFORMATION AND COMPUTATION, PROCEEDINGS, 2007, : 405 - 414