Prosody modeling for syllable based text-to-speech synthesis using feedforward neural networks

被引:11
|
作者
Reddy, V. Ramu [1 ]
Rao, K. Sreenivasa [1 ]
机构
[1] Indian Inst Technol, Sch Informat Technol, Kharagpur 721302, W Bengal, India
关键词
Prosody; Text-to-speech synthesis; Feed-forward neural networks; Phonological features; Positional and contextual features; Articulatory features; DURATION;
D O I
10.1016/j.neucom.2015.07.053
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Prosody plays an important role in improving the quality of text-to-speech synthesis (TTS) system. In this paper, features related to the linguistic and the production constraints are proposed for modeling the prosodic parameters such as duration, intonation and intensities of the syllables. The linguistic constraints are represented by positional, contextual and phonological features, and the production constraints are represented by articulatory features. Neural network models are explored to capture the implicit duration, F-0 and intensity knowledge using above mentioned features. The prediction performance of the proposed neural network models is evaluated using objective measures such as average prediction error (mu), standard deviation (sigma) and linear correlation coefficient (gamma(X,Y)). The prediction accuracy of the proposed neural network models is compared with other state-of-the-art prosody models used in TTS systems. The prediction accuracy of the proposed prosody models is also verified by conducting listening tests, after integrating the proposed prosody models to the baseline TTS system. (C) 2015 Elsevier B.V. All rights reserved.
引用
收藏
页码:1323 / 1334
页数:12
相关论文
共 50 条
  • [1] Two-stage intonation modeling using feedforward neural networks for syllable based text-to-speech synthesis
    Reddy, V. Ramu
    Rao, K. Sreenivasa
    [J]. COMPUTER SPEECH AND LANGUAGE, 2013, 27 (05): : 1105 - 1126
  • [2] Intensity Modeling for Syllable Based Text-to-Speech Synthesis
    Reddy, V. Ramu
    Rao, K. Sreenivasa
    [J]. CONTEMPORARY COMPUTING, 2012, 306 : 106 - 117
  • [3] Syllable based text to speech synthesis system using auto associative neural network prosody prediction
    Sangeetha, Sudhakar
    Jothilakshmi, Sekar
    [J]. INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2014, 17 (02) : 91 - 98
  • [4] Modeling stylized invariance and local variability of prosody in text-to-speech synthesis
    Chu, Min
    Zhao, Yong
    Chang, Eric
    [J]. SPEECH COMMUNICATION, 2006, 48 (06) : 716 - 726
  • [5] A RULE BASED PROSODY MODEL FOR TURKISH TEXT-TO-SPEECH SYNTHESIS
    Uslu, Ibrahim Baran
    Ilk, Hakki Gokhan
    Yilmaz, Asim Egemen
    [J]. TEHNICKI VJESNIK-TECHNICAL GAZETTE, 2013, 20 (02): : 217 - 223
  • [6] Novel Eigenpitch-based Prosody Model for Text-to-Speech Synthesis
    Tian, Jilei
    Nurminen, Jani
    Kiss, Imre
    [J]. INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 313 - 316
  • [7] Evaluation of Prosody in Text-to-Speech Synthesis System of Bangla
    Basu, Tulika
    Saha, Arup
    [J]. 2013 INTERNATIONAL CONFERENCE ORIENTAL COCOSDA HELD JOINTLY WITH 2013 CONFERENCE ON ASIAN SPOKEN LANGUAGE RESEARCH AND EVALUATION (O-COCOSDA/CASLRE), 2013,
  • [8] PROSODYSPEECH: TOWARDS ADVANCED PROSODY MODEL FOR NEURAL TEXT-TO-SPEECH
    Yi, Yuanhao
    He, Lei
    Pan, Shifeng
    Wang, Xi
    Xiao, Yujia
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 7582 - 7586
  • [9] A Novel Text-to-Speech Synthesis System Using Syllable-Based HMM for Tamil Language
    Manoharan, J. Samuel
    [J]. PROCEEDINGS OF SECOND INTERNATIONAL CONFERENCE ON SUSTAINABLE EXPERT SYSTEMS (ICSES 2021), 2022, 351 : 305 - 314
  • [10] Speech Modification for Prosody Conversion in Expressive Marathi Text-to-Speech Synthesis
    Anil, Manjare Chandraprabha
    Shirbahadurkar, S. D.
    [J]. 2014 INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING AND INTEGRATED NETWORKS (SPIN), 2014, : 56 - 58