SPEECH PROSODY CONTROL USING WEIGHTED NEURAL NETWORK ENSEMBLES

被引:0
|
作者
Romsdorfer, Harald [1 ]
机构
[1] ETH, Speech Proc Grp, Zurich, Switzerland
关键词
speech synthesis; prosody control; neural networks; ensemble models; REGRESSION;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Ensembles of artificial neural networks (ANNs) show improved generalization capabilities that outperform those of single networks. However, for aggregation to be effective, the individual networks must be as accurate and diverse as possible. This paper presents a new statistical model for prosody control that combines weighted ensembles of ANNs with feature relevance determination. This approach allows the individual networks to be accurate and diverse. The weighted neural network ensemble model was applied for both, phone duration modeling and fundamental frequency modeling. A comparison with state-of-the-art prosody models based on classification and regression trees (CART), multivariate adaptive regression splines (MARS), or ANN, shows a 12% improvement compared to the best duration model and a 24% improvement compared to the best F-0 model. The neural network ensemble model also outperforms another, recently presented ensemble model based on gradient tree boosting.
引用
收藏
页码:299 / 304
页数:6
相关论文
共 50 条
  • [1] Weighted Neural Network Ensemble Models for Speech Prosody Control
    Romsdorfer, Harald
    [J]. INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 492 - 495
  • [2] Weighted combination of neural network ensembles
    Wanas, NM
    Kamel, MS
    [J]. PROCEEDING OF THE 2002 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, VOLS 1-3, 2002, : 1748 - 1752
  • [3] Optimisation of artificial neural network topology applied in the prosody control in text-to-speech synthesis
    Sebesta, V
    Tucková, J
    [J]. SOFSEM 2000: THEORY AND PRACTICE OF INFORMATICS, 2000, 1963 : 420 - 430
  • [4] ENTRAINMENT ANALYSIS FOR ASSESSMENT OF AUTISTIC SPEECH PROSODY USING BOTTLENECK FEATURES OF DEEP NEURAL NETWORK
    Ochi, Keiko
    Ono, Nobutaka
    Owada, Keiho
    Kuroda, Miho
    Sagayama, Shigeki
    Yamasue, Hidenori
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 8492 - 8496
  • [5] ENTRAINMENT ANALYSIS FOR ASSESSMENT OF AUTISTIC SPEECH PROSODY USING BOTTLENECK FEATURES OF DEEP NEURAL NETWORK
    Ochi, Keiko
    Ono, Nobutaka
    Owada, Keiho
    Kuroda, Miho
    Sagayama, Shigeki
    Yamasue, Hidenori
    [J]. ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, 2022, 2022-May : 8492 - 8496
  • [6] Evolving neural network ensembles for control problems
    Pardoe, David
    Ryoo, Michael
    Miikkulainen, Risto
    [J]. GECCO 2005: GENETIC AND EVOLUTIONARY COMPUTATION CONFERENCE, VOLS 1 AND 2, 2005, : 1379 - 1384
  • [7] Syllable based text to speech synthesis system using auto associative neural network prosody prediction
    Sangeetha, Sudhakar
    Jothilakshmi, Sekar
    [J]. INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2014, 17 (02) : 91 - 98
  • [8] Polyglot Speech Prosody Control
    Romsdorfer, Harald
    [J]. INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 504 - 507
  • [9] Artificial neural network approach to the modelling of prosody in the speech synthesizer of the Czech language
    Tuckova, Jana
    Sebesta, Vaclav
    [J]. PROCEDINGS OF THE 11TH IASTED INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND SOFT COMPUTING, 2007, : 1 - 6
  • [10] Optimisation of neural network topology and input parameters for prosody modelling of synthetic speech
    Sebesta, V
    Tuckova, J
    [J]. APPLICATIONS AND SCIENCE IN SOFT COMPUTING, 2004, : 9 - 16