Enhancing the Quality of Nepali Text-to-Speech Systems

被引：0

作者：

Ghimire, Rupak Raj ^{[1
]}

Bal, Bal Krishna ^{[1
]}

机构：

[1] Kathmandu Univ, Dept Comp Sci & Engn, Informat & Language Proc Res Lab, Dhulikhel, Kavre, Nepal

来源：

CREATIVITY IN INTELLIGENT TECHNOLOGIES AND DATA SCIENCE, (CIT&DS) | 2017年 / 754卷

关键词：

Speech technology; Text-to-Speech; Natural language processing; Digital signal processing; Speech synthesis; Nepali-TTS; Unit selection; Synthesized voice;

D O I：

10.1007/978-3-319-65551-2_14

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Text-to-speech (TTS) systems are widely studied applications in Computer Science. It is more popular among the languages which has rich set of resources such as English and not as rigorously taken up in under resourced languages such as Nepali. Nevertheless, it has wider scope of application in different areas including telephony, e-learning and telecommunication. The underresourced languages have trouble in developing the natural sounding TTS system. This is primarily because of the linguistic resources involved in the system. The preparation of such linguistic resources is costly, time consuming and requires the involvement of linguists/experts. The general trend in this research domain is to develop natural sounding TTS out of limited resources available. Nepali, being an underresourced language has very few linguistic resources available for developing TTS system. In this work, we modified the existing TTS system [ 9] by adding computational units to process the input and output, we call them post and pre processing modules. We also made the system available to the public through the desktop application and plugin for the Firefox by pruning and adding phonetic rules and normalization rules. We evaluated the existing and modified TTS systems via the qualitative evaluation techniques where 30 users were asked to provide their evaluation of the systems being based on the parameters-intelligibility and naturalness. Our results have shown that there has been an overall improvement of 6% in terms of naturalness and intelligibility, whereas the result of comprehension and diagnostic rhyme test is increased by 12% and 10% respectively.

引用

页码：187 / 197

页数：11

共 50 条

[1] Perceptual Quality Dimensions of Text-to-Speech Systems
Hinterleitner, Florian
Moeller, Sebastian
Norrenbrock, Christoph
Heute, Ulrich
[J]. 12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 2188 - 2191
[2] Comparison of measures of speech quality for listening tests of text-to-speech systems
Viswanathan, M
Viswanathan, M
[J]. PROCEEDINGS OF THE 2002 IEEE WORKSHOP ON SPEECH SYNTHESIS, 2002, : 11 - 14
[3] Better Human Computer Interaction by Enhancing the Quality of Text-to-Speech Synthesis
Reddy, V. Ramu
Rao, K. Sreenivasa
[J]. 4TH INTERNATIONAL CONFERENCE ON INTELLIGENT HUMAN COMPUTER INTERACTION (IHCI 2012), 2012,
[4] Physiological Quality-of-Experience Assessment of Text-to-Speech Systems
Gupta, Rishabh
Falk, Tiago H.
[J]. 2016 IEEE 18TH INTERNATIONAL WORKSHOP ON MULTIMEDIA SIGNAL PROCESSING (MMSP), 2016,
[5] Comparison of Approaches for Instrumentally Predicting the Quality of Text-To-Speech Systems
Moeller, Sebastian
Hinterleitner, Florian
Falk, Tiago H.
Polzehl, Tim
[J]. 11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 1-2, 2010, : 1325 - +
[6] A text analyzer for Korean text-to-speech systems
Lee, SH
Oh, YH
[J]. ICSLP 96 - FOURTH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, VOLS 1-4, 1996, : 1692 - 1695
[7] Subjective evaluation and comparison of the speech quality of text-to-speech systems for the German language
Klaus, H
Fellbaum, K
Sotscheck, J
[J]. ACUSTICA, 1997, 83 (01): : 124 - 136
[8] The use of lexica in text-to-speech systems
Quazza, S
Van den Heuvel, H
[J]. LEXICON DEVELOPMENT FOR SPEECH AND LANGUAGE PROCESSING, 2000, 12 : 207 - 233
[9] Multimodal Physiological Quality-of-Experience Assessment of Text-to-Speech Systems
Gupta, Rishabh
Banville, Hubert J.
Falk, Tiago H.
[J]. IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2017, 11 (01) : 22 - 36
[10] Enhancing Sequence-to-Sequence Text-to-Speech with Morphology
Taylor, Jason
Richmond, Korin
[J]. INTERSPEECH 2020, 2020, : 1738 - 1742

← 1 2 3 4 5 →