Intensity Modeling for Syllable Based Text-to-Speech Synthesis

被引:0
|
作者
Reddy, V. Ramu [1 ]
Rao, K. Sreenivasa [1 ]
机构
[1] Indian Inst Technol, Sch Informat Technol, Kharagpur 721302, W Bengal, India
来源
CONTEMPORARY COMPUTING | 2012年 / 306卷
关键词
Syllable intensities; Intensity prediction; LR; CART; FENN; Phonological; Contextual; Positional; Articulatory; Linguistic; Production; Naturalness; Intelligibility;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The quality of text-to-speech (TTS) synthesis systems can be improved by controlling the intensities of speech segments in addition to durations and intonation. This paper proposes linguistic and production constraints for modeling the intensity patterns of sequence of syllables. Linguistic constraints are represented by positional, contextual and phonological features, and production constraints are represented by articulatory features associated to syllables. In this work, feedforward neural network (FFNN) is proposed to model the intensities of syllables. The proposed FFNN model is evaluated by means of objective measures such as average prediction error (mu), standard deviation (sigma), correlation coefficient (gamma X,Y) and the percentage of syllables predicted within different deviations. The prediction performance of the proposed model is compared with other statistical models such as Linear Regression (LR) and Classification and Regression Tree (CART) models. The models are also evaluated by means of subjective listening tests on the synthesized speech generated by incorporating the predicted syllable intensities in Bengali TTS system. From the evaluation studies, it is observed that prediction accuracy is better for FFNN models, compared to other models.
引用
收藏
页码:106 / 117
页数:12
相关论文
共 50 条
  • [1] Prosody modeling for syllable based text-to-speech synthesis using feedforward neural networks
    Reddy, V. Ramu
    Rao, K. Sreenivasa
    [J]. NEUROCOMPUTING, 2016, 171 : 1323 - 1334
  • [2] Two-stage intonation modeling using feedforward neural networks for syllable based text-to-speech synthesis
    Reddy, V. Ramu
    Rao, K. Sreenivasa
    [J]. COMPUTER SPEECH AND LANGUAGE, 2013, 27 (05): : 1105 - 1126
  • [3] Durational evidence for syllable boundary of /n/ and /l/ in text-to-speech synthesis
    Tian, Fang
    [J]. Journal of Multimedia, 2013, 8 (02): : 82 - 89
  • [4] Syllable specific unit selection cost functions for text-to-speech synthesis
    Narendra, N.P.
    Sreenivasa Rao, K.
    [J]. ACM Transactions on Speech and Language Processing, 2012, 9 (03):
  • [5] Algorithms for Speech Segmentation at Syllable-Level for Text-to-Speech Synthesis System in Gujarati
    Patil, Hemant A.
    Patel, Tanvina
    Talesara, Swati
    Shah, Nirmesh
    Sailor, Hardik
    Vachhani, Bhavik
    Akhani, Janki
    Kanakiya, Bhargav
    Gaur, Yashesh
    Prajapati, Vibha
    [J]. 2013 INTERNATIONAL CONFERENCE ORIENTAL COCOSDA HELD JOINTLY WITH 2013 CONFERENCE ON ASIAN SPOKEN LANGUAGE RESEARCH AND EVALUATION (O-COCOSDA/CASLRE), 2013,
  • [6] FORMULATION OF SYLLABLE BASED PRONUNCIATION MODELS FOR TAMIL TEXT-TO-SPEECH SYNTHESIZER
    Rajendran, Vaibhavi
    Kumar, G. Bharadwaja
    [J]. MALAYSIAN JOURNAL OF COMPUTER SCIENCE, 2020, 33 (04) : 282 - 297
  • [7] Syllable-level representations of suprasegmental features for DNN-based text-to-speech synthesis
    Ribeiro, Manuel Sam
    Watts, Oliver
    Yamagishi, Junichi
    [J]. 17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 3186 - 3190
  • [8] A Novel Text-to-Speech Synthesis System Using Syllable-Based HMM for Tamil Language
    Manoharan, J. Samuel
    [J]. PROCEEDINGS OF SECOND INTERNATIONAL CONFERENCE ON SUSTAINABLE EXPERT SYSTEMS (ICSES 2021), 2022, 351 : 305 - 314
  • [9] Modeling and synthesizing emotional speech for Catalan text-to-speech synthesis
    Iriondo, I
    Alías, F
    Melenchón, J
    Llorca, MA
    [J]. AFFECTIVE DIALOGUE SYSTEMS, PROCEEDINGS, 2004, 3068 : 197 - 208
  • [10] TEXT-TO-SPEECH SYNTHESIS
    SPROAT, RW
    OLIVE, JP
    [J]. AT&T TECHNICAL JOURNAL, 1995, 74 (02): : 35 - 44