SPEECH PROSODY CONTROL USING WEIGHTED NEURAL NETWORK ENSEMBLES

被引:0
|
作者
Romsdorfer, Harald [1 ]
机构
[1] ETH, Speech Proc Grp, Zurich, Switzerland
关键词
speech synthesis; prosody control; neural networks; ensemble models; REGRESSION;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Ensembles of artificial neural networks (ANNs) show improved generalization capabilities that outperform those of single networks. However, for aggregation to be effective, the individual networks must be as accurate and diverse as possible. This paper presents a new statistical model for prosody control that combines weighted ensembles of ANNs with feature relevance determination. This approach allows the individual networks to be accurate and diverse. The weighted neural network ensemble model was applied for both, phone duration modeling and fundamental frequency modeling. A comparison with state-of-the-art prosody models based on classification and regression trees (CART), multivariate adaptive regression splines (MARS), or ANN, shows a 12% improvement compared to the best duration model and a 24% improvement compared to the best F-0 model. The neural network ensemble model also outperforms another, recently presented ensemble model based on gradient tree boosting.
引用
收藏
页码:299 / 304
页数:6
相关论文
共 50 条
  • [21] Deep Convolutional Neural Network Ensembles Using ECOC
    Ahmed, Sara Atito Ali
    Zor, Cemre
    Awais, Muhammad
    Yanikoglu, Berrin
    Kittler, Josef
    [J]. IEEE ACCESS, 2021, 9 : 86083 - 86095
  • [22] Cascaded Face Detection Using Neural Network Ensembles
    Fei Zuo
    Peter H. N. de With
    [J]. EURASIP Journal on Advances in Signal Processing, 2008
  • [23] Modeling credit scoring using neural network ensembles
    Tsai, Chih-Fong
    Hung, Chihli
    [J]. KYBERNETES, 2014, 43 (07) : 1114 - 1123
  • [24] Multistage neural network ensembles
    Yang, S
    Browne, A
    Picton, PD
    [J]. MULTIPLE CLASSIFIER SYSTEMS, 2002, 2364 : 91 - 97
  • [25] A survey of neural network ensembles
    Zhao, Y
    Gao, J
    Yang, XZ
    [J]. PROCEEDINGS OF THE 2005 INTERNATIONAL CONFERENCE ON NEURAL NETWORKS AND BRAIN, VOLS 1-3, 2005, : 438 - 442
  • [26] Deep Neural Network Ensembles
    Tao, Sean
    [J]. MACHINE LEARNING, OPTIMIZATION, AND DATA SCIENCE, 2019, 11943 : 1 - 12
  • [27] SPEECH BERT EMBEDDING FOR IMPROVING PROSODY IN NEURAL TTS
    Chen, Liping
    Deng, Yan
    Wang, Xi
    Soong, Frank K.
    He, Lei
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6563 - 6567
  • [28] A study on real time prosody control of speech
    Kim, J
    Jo, W
    Bae, M
    [J]. CCCT 2003, VOL 4, PROCEEDINGS: COMPUTER, COMMUNICATION AND CONTROL TECHNOLOGIES: I, 2003, : 195 - 198
  • [29] Metabolic site prediction using artificial neural network ensembles
    Waldman, Marvin
    Fraczkiewicz, Robert
    Zhang, Jinhua
    Clark, Robert D.
    Woltosz, Walter S.
    [J]. ABSTRACTS OF PAPERS OF THE AMERICAN CHEMICAL SOCIETY, 2011, 241
  • [30] Fast face detection using a cascade of neural network ensembles
    Zuo, F
    de With, PHN
    [J]. ADVANCED CONCEPTS FOR INTELLIGENT VISION SYSTEMS, PROCEEDINGS, 2005, 3708 : 26 - 34