Enhancing the Quality of Nepali Text-to-Speech Systems

被引:0
|
作者
Ghimire, Rupak Raj [1 ]
Bal, Bal Krishna [1 ]
机构
[1] Kathmandu Univ, Dept Comp Sci & Engn, Informat & Language Proc Res Lab, Dhulikhel, Kavre, Nepal
关键词
Speech technology; Text-to-Speech; Natural language processing; Digital signal processing; Speech synthesis; Nepali-TTS; Unit selection; Synthesized voice;
D O I
10.1007/978-3-319-65551-2_14
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Text-to-speech (TTS) systems are widely studied applications in Computer Science. It is more popular among the languages which has rich set of resources such as English and not as rigorously taken up in under resourced languages such as Nepali. Nevertheless, it has wider scope of application in different areas including telephony, e-learning and telecommunication. The underresourced languages have trouble in developing the natural sounding TTS system. This is primarily because of the linguistic resources involved in the system. The preparation of such linguistic resources is costly, time consuming and requires the involvement of linguists/experts. The general trend in this research domain is to develop natural sounding TTS out of limited resources available. Nepali, being an underresourced language has very few linguistic resources available for developing TTS system. In this work, we modified the existing TTS system [ 9] by adding computational units to process the input and output, we call them post and pre processing modules. We also made the system available to the public through the desktop application and plugin for the Firefox by pruning and adding phonetic rules and normalization rules. We evaluated the existing and modified TTS systems via the qualitative evaluation techniques where 30 users were asked to provide their evaluation of the systems being based on the parameters-intelligibility and naturalness. Our results have shown that there has been an overall improvement of 6% in terms of naturalness and intelligibility, whereas the result of comprehension and diagnostic rhyme test is increased by 12% and 10% respectively.
引用
收藏
页码:187 / 197
页数:11
相关论文
共 50 条
  • [1] Perceptual Quality Dimensions of Text-to-Speech Systems
    Hinterleitner, Florian
    Moeller, Sebastian
    Norrenbrock, Christoph
    Heute, Ulrich
    [J]. 12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 2188 - 2191
  • [2] Comparison of measures of speech quality for listening tests of text-to-speech systems
    Viswanathan, M
    Viswanathan, M
    [J]. PROCEEDINGS OF THE 2002 IEEE WORKSHOP ON SPEECH SYNTHESIS, 2002, : 11 - 14
  • [3] Better Human Computer Interaction by Enhancing the Quality of Text-to-Speech Synthesis
    Reddy, V. Ramu
    Rao, K. Sreenivasa
    [J]. 4TH INTERNATIONAL CONFERENCE ON INTELLIGENT HUMAN COMPUTER INTERACTION (IHCI 2012), 2012,
  • [4] Physiological Quality-of-Experience Assessment of Text-to-Speech Systems
    Gupta, Rishabh
    Falk, Tiago H.
    [J]. 2016 IEEE 18TH INTERNATIONAL WORKSHOP ON MULTIMEDIA SIGNAL PROCESSING (MMSP), 2016,
  • [5] Comparison of Approaches for Instrumentally Predicting the Quality of Text-To-Speech Systems
    Moeller, Sebastian
    Hinterleitner, Florian
    Falk, Tiago H.
    Polzehl, Tim
    [J]. 11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 1-2, 2010, : 1325 - +
  • [6] A text analyzer for Korean text-to-speech systems
    Lee, SH
    Oh, YH
    [J]. ICSLP 96 - FOURTH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, VOLS 1-4, 1996, : 1692 - 1695
  • [7] Subjective evaluation and comparison of the speech quality of text-to-speech systems for the German language
    Klaus, H
    Fellbaum, K
    Sotscheck, J
    [J]. ACUSTICA, 1997, 83 (01): : 124 - 136
  • [8] The use of lexica in text-to-speech systems
    Quazza, S
    Van den Heuvel, H
    [J]. LEXICON DEVELOPMENT FOR SPEECH AND LANGUAGE PROCESSING, 2000, 12 : 207 - 233
  • [9] Multimodal Physiological Quality-of-Experience Assessment of Text-to-Speech Systems
    Gupta, Rishabh
    Banville, Hubert J.
    Falk, Tiago H.
    [J]. IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2017, 11 (01) : 22 - 36
  • [10] Enhancing Sequence-to-Sequence Text-to-Speech with Morphology
    Taylor, Jason
    Richmond, Korin
    [J]. INTERSPEECH 2020, 2020, : 1738 - 1742