A RULE BASED PROSODY MODEL FOR TURKISH TEXT-TO-SPEECH SYNTHESIS

被引:0
|
作者
Uslu, Ibrahim Baran [1 ]
Ilk, Hakki Gokhan [2 ]
Yilmaz, Asim Egemen [2 ]
机构
[1] Atilim Univ, Fac Engn, Elect Elect Eng Dept, TR-06836 Incek Ankara, Turkey
[2] Ankara Univ, Fac Engn, Elect Elect Eng Dept, TR-06100 Tandogan, Turkey
来源
TEHNICKI VJESNIK-TECHNICAL GAZETTE | 2013年 / 20卷 / 02期
关键词
CMOS test; diphone; natural speech; prosody; PSOLA; text-to-speech synthesis (TTS); verb inflection;
D O I
暂无
中图分类号
T [工业技术];
学科分类号
08 ;
摘要
This paper presents our novel prosody model in a Turkish text-to-speech synthesis (TTS) system. After developing a TTS system driven by parametric features consisting of duration, pitch and energy modifications, we try to figure out some prosody rules in order to increase the naturalness of our synthesizer. Since the inflected verbs in Turkish can be stand-alone sentences with the suffixes they take, we build a perceptual prosody model by defining rules on the stress patterns of verb inflections. Affirmative, negative and interrogative (both positive and negative) forms of many verbs were examined in a systematic way. Not only verbs, but in the same way, some phrases were examined for obtaining a proper prosody. According to the results of listening tests, the defined rules based on duration, pitch and energy modification weights, result in perceptually better speech synthesis, namely about 1,78/5,0 improvement in average in the CMOS (Comparative Mean Opinion Score) test. This improvement shows the success of our novel prosody model.
引用
收藏
页码:217 / 223
页数:7
相关论文
共 50 条
  • [1] Novel Eigenpitch-based Prosody Model for Text-to-Speech Synthesis
    Tian, Jilei
    Nurminen, Jani
    Kiss, Imre
    [J]. INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 313 - 316
  • [2] FUJISAKI INTONATION MODEL IN TURKISH TEXT-TO-SPEECH SYNTHESIS
    Uslu, Baran
    Ilk, H. Goekhan
    [J]. 2009 IEEE 17TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE, VOLS 1 AND 2, 2009, : 133 - 136
  • [3] Towards a multilingual prosody model for text-to-speech
    Jokisch, O
    Ding, HW
    Kruschke, H
    [J]. 2002 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-IV, PROCEEDINGS, 2002, : 421 - 424
  • [4] Evaluation of Prosody in Text-to-Speech Synthesis System of Bangla
    Basu, Tulika
    Saha, Arup
    [J]. 2013 INTERNATIONAL CONFERENCE ORIENTAL COCOSDA HELD JOINTLY WITH 2013 CONFERENCE ON ASIAN SPOKEN LANGUAGE RESEARCH AND EVALUATION (O-COCOSDA/CASLRE), 2013,
  • [5] Prosody model in a Mandarin Text-to-Speech System based on a hierarchical approach
    Pan, NH
    Jen, WT
    Yu, SS
    Yu, MS
    Huang, SY
    Wu, MJ
    [J]. 2000 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, PROCEEDINGS VOLS I-III, 2000, : 448 - 451
  • [6] Improving the Prosody of RNN-based English Text-To-Speech Synthesis by Incorporating a BERT model
    Kenter, Tom
    Sharma, Manish
    Clark, Rob
    [J]. INTERSPEECH 2020, 2020, : 4412 - 4416
  • [7] Speech Modification for Prosody Conversion in Expressive Marathi Text-to-Speech Synthesis
    Anil, Manjare Chandraprabha
    Shirbahadurkar, S. D.
    [J]. 2014 INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING AND INTEGRATED NETWORKS (SPIN), 2014, : 56 - 58
  • [8] Rule-Based Storytelling Text-to-Speech (TTS) Synthesis
    Ramli, Izzad
    Seman, Noraini
    Ardi, Norizah
    Jamil, Nursuriati
    [J]. 2016 3RD INTERNATIONAL CONFERENCE ON MECHANICS AND MECHATRONICS RESEARCH (ICMMR 2016), 2016, 77
  • [9] Combining conversational speech with read speech to improve prosody in Text-to-Speech synthesis
    O'Mahony, Johannah
    Lai, Catherine
    King, Simon
    [J]. INTERSPEECH 2022, 2022, : 3388 - 3392
  • [10] Dealing with prosody in a text-to-speech system
    Goldsmith J.
    [J]. International Journal of Speech Technology, 1999, 3 (1) : 51 - 63