Intensity Modeling for Syllable Based Text-to-Speech Synthesis

被引：0

作者：

Reddy, V. Ramu ^{[1
]}

Rao, K. Sreenivasa ^{[1
]}

机构：

[1] Indian Inst Technol, Sch Informat Technol, Kharagpur 721302, W Bengal, India

来源：

CONTEMPORARY COMPUTING | 2012年 / 306卷

关键词：

Syllable intensities; Intensity prediction; LR; CART; FENN; Phonological; Contextual; Positional; Articulatory; Linguistic; Production; Naturalness; Intelligibility;

D O I：

暂无

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

The quality of text-to-speech (TTS) synthesis systems can be improved by controlling the intensities of speech segments in addition to durations and intonation. This paper proposes linguistic and production constraints for modeling the intensity patterns of sequence of syllables. Linguistic constraints are represented by positional, contextual and phonological features, and production constraints are represented by articulatory features associated to syllables. In this work, feedforward neural network (FFNN) is proposed to model the intensities of syllables. The proposed FFNN model is evaluated by means of objective measures such as average prediction error (mu), standard deviation (sigma), correlation coefficient (gamma X,Y) and the percentage of syllables predicted within different deviations. The prediction performance of the proposed model is compared with other statistical models such as Linear Regression (LR) and Classification and Regression Tree (CART) models. The models are also evaluated by means of subjective listening tests on the synthesized speech generated by incorporating the predicted syllable intensities in Bengali TTS system. From the evaluation studies, it is observed that prediction accuracy is better for FFNN models, compared to other models.

引用

页码：106 / 117

页数：12

共 50 条

[1] Prosody modeling for syllable based text-to-speech synthesis using feedforward neural networks
Reddy, V. Ramu
Rao, K. Sreenivasa
[J]. NEUROCOMPUTING, 2016, 171 : 1323 - 1334
[2] Two-stage intonation modeling using feedforward neural networks for syllable based text-to-speech synthesis
Reddy, V. Ramu
Rao, K. Sreenivasa
[J]. COMPUTER SPEECH AND LANGUAGE, 2013, 27 (05): : 1105 - 1126
[3] Durational evidence for syllable boundary of /n/ and /l/ in text-to-speech synthesis
Tian, Fang
[J]. Journal of Multimedia, 2013, 8 (02): : 82 - 89
[4] Syllable specific unit selection cost functions for text-to-speech synthesis
Narendra, N.P.
Sreenivasa Rao, K.
[J]. ACM Transactions on Speech and Language Processing, 2012, 9 (03):
[5] Algorithms for Speech Segmentation at Syllable-Level for Text-to-Speech Synthesis System in Gujarati
Patil, Hemant A.
Patel, Tanvina
Talesara, Swati
Shah, Nirmesh
Sailor, Hardik
Vachhani, Bhavik
Akhani, Janki
Kanakiya, Bhargav
Gaur, Yashesh
Prajapati, Vibha
[J]. 2013 INTERNATIONAL CONFERENCE ORIENTAL COCOSDA HELD JOINTLY WITH 2013 CONFERENCE ON ASIAN SPOKEN LANGUAGE RESEARCH AND EVALUATION (O-COCOSDA/CASLRE), 2013,
[6] FORMULATION OF SYLLABLE BASED PRONUNCIATION MODELS FOR TAMIL TEXT-TO-SPEECH SYNTHESIZER
Rajendran, Vaibhavi
Kumar, G. Bharadwaja
[J]. MALAYSIAN JOURNAL OF COMPUTER SCIENCE, 2020, 33 (04) : 282 - 297
[7] Syllable-level representations of suprasegmental features for DNN-based text-to-speech synthesis
Ribeiro, Manuel Sam
Watts, Oliver
Yamagishi, Junichi
[J]. 17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 3186 - 3190
[8] A Novel Text-to-Speech Synthesis System Using Syllable-Based HMM for Tamil Language
Manoharan, J. Samuel
[J]. PROCEEDINGS OF SECOND INTERNATIONAL CONFERENCE ON SUSTAINABLE EXPERT SYSTEMS (ICSES 2021), 2022, 351 : 305 - 314
[9] Modeling and synthesizing emotional speech for Catalan text-to-speech synthesis
Iriondo, I
Alías, F
Melenchón, J
Llorca, MA
[J]. AFFECTIVE DIALOGUE SYSTEMS, PROCEEDINGS, 2004, 3068 : 197 - 208
[10] TEXT-TO-SPEECH SYNTHESIS
SPROAT, RW
OLIVE, JP
[J]. AT&T TECHNICAL JOURNAL, 1995, 74 (02): : 35 - 44

← 1 2 3 4 5 →