A RULE BASED PROSODY MODEL FOR TURKISH TEXT-TO-SPEECH SYNTHESIS

被引：0

作者：

Uslu, Ibrahim Baran ^{[1
]}

Ilk, Hakki Gokhan ^{[2
]}

Yilmaz, Asim Egemen ^{[2
]}

机构：

[1] Atilim Univ, Fac Engn, Elect Elect Eng Dept, TR-06836 Incek Ankara, Turkey

[2] Ankara Univ, Fac Engn, Elect Elect Eng Dept, TR-06100 Tandogan, Turkey

来源：

TEHNICKI VJESNIK-TECHNICAL GAZETTE | 2013年 / 20卷 / 02期

关键词：

CMOS test; diphone; natural speech; prosody; PSOLA; text-to-speech synthesis (TTS); verb inflection;

D O I：

暂无

中图分类号：

T [工业技术];

学科分类号：

08 ;

摘要：

This paper presents our novel prosody model in a Turkish text-to-speech synthesis (TTS) system. After developing a TTS system driven by parametric features consisting of duration, pitch and energy modifications, we try to figure out some prosody rules in order to increase the naturalness of our synthesizer. Since the inflected verbs in Turkish can be stand-alone sentences with the suffixes they take, we build a perceptual prosody model by defining rules on the stress patterns of verb inflections. Affirmative, negative and interrogative (both positive and negative) forms of many verbs were examined in a systematic way. Not only verbs, but in the same way, some phrases were examined for obtaining a proper prosody. According to the results of listening tests, the defined rules based on duration, pitch and energy modification weights, result in perceptually better speech synthesis, namely about 1,78/5,0 improvement in average in the CMOS (Comparative Mean Opinion Score) test. This improvement shows the success of our novel prosody model.

引用

页码：217 / 223

页数：7

共 50 条

[1] Novel Eigenpitch-based Prosody Model for Text-to-Speech Synthesis
Tian, Jilei
Nurminen, Jani
Kiss, Imre
[J]. INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 313 - 316
[2] FUJISAKI INTONATION MODEL IN TURKISH TEXT-TO-SPEECH SYNTHESIS
Uslu, Baran
Ilk, H. Goekhan
[J]. 2009 IEEE 17TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE, VOLS 1 AND 2, 2009, : 133 - 136
[3] Towards a multilingual prosody model for text-to-speech
Jokisch, O
Ding, HW
Kruschke, H
[J]. 2002 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-IV, PROCEEDINGS, 2002, : 421 - 424
[4] Evaluation of Prosody in Text-to-Speech Synthesis System of Bangla
Basu, Tulika
Saha, Arup
[J]. 2013 INTERNATIONAL CONFERENCE ORIENTAL COCOSDA HELD JOINTLY WITH 2013 CONFERENCE ON ASIAN SPOKEN LANGUAGE RESEARCH AND EVALUATION (O-COCOSDA/CASLRE), 2013,
[5] Prosody model in a Mandarin Text-to-Speech System based on a hierarchical approach
Pan, NH
Jen, WT
Yu, SS
Yu, MS
Huang, SY
Wu, MJ
[J]. 2000 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, PROCEEDINGS VOLS I-III, 2000, : 448 - 451
[6] Improving the Prosody of RNN-based English Text-To-Speech Synthesis by Incorporating a BERT model
Kenter, Tom
Sharma, Manish
Clark, Rob
[J]. INTERSPEECH 2020, 2020, : 4412 - 4416
[7] Speech Modification for Prosody Conversion in Expressive Marathi Text-to-Speech Synthesis
Anil, Manjare Chandraprabha
Shirbahadurkar, S. D.
[J]. 2014 INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING AND INTEGRATED NETWORKS (SPIN), 2014, : 56 - 58
[8] Rule-Based Storytelling Text-to-Speech (TTS) Synthesis
Ramli, Izzad
Seman, Noraini
Ardi, Norizah
Jamil, Nursuriati
[J]. 2016 3RD INTERNATIONAL CONFERENCE ON MECHANICS AND MECHATRONICS RESEARCH (ICMMR 2016), 2016, 77
[9] Combining conversational speech with read speech to improve prosody in Text-to-Speech synthesis
O'Mahony, Johannah
Lai, Catherine
King, Simon
[J]. INTERSPEECH 2022, 2022, : 3388 - 3392
[10] Dealing with prosody in a text-to-speech system
Goldsmith J.
[J]. International Journal of Speech Technology, 1999, 3 (1) : 51 - 63

← 1 2 3 4 5 →