Transfer Learning based Progressive Neural Networks for Acoustic Modeling in Statistical Parametric Speech Synthesis

被引:2
|
作者
Fu, Ruibo [1 ,2 ]
Tao, Jianhua [1 ,2 ,3 ]
Zheng, Yibin [1 ,2 ]
Wen, Zhengqi [1 ]
机构
[1] Chinese Acad Sci, Inst Automat, Natl Lab Pattern Recognit, Beijing, Peoples R China
[2] Univ Chinese Acad Sci, Sch Artificial Intelligence, Beijing, Peoples R China
[3] CAS Ctr Excellence Brain Sci & Intelligence Techn, Beijing, Peoples R China
基金
中国国家自然科学基金;
关键词
speech synthesis; progressive neural networks; acoustic modeling; transfer learning;
D O I
10.21437/Interspeech.2018-1265
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The fundamental frequency and the spectrum parameters of the speech are correlated thus one of their learned mapping from the linguistic features can be leveraged to help determine the other. The conventional methods treated all the acoustic features as one stream for acoustic modeling. And the multi-task learning methods were applied to acoustic modeling with several targets in a global cost function. To improve the accuracy of the acoustic model, the progressive deep neural networks (PDNN) is applied for acoustic modeling in statistical parametric speech synthesis (SPSS) in our method. Each type of the acoustic features is modeled in different sub-networks with its own cost function and the knowledge transfers through lateral connections. Each sub-network in the PDNN can be trained step by step to reach its own optimum. Experiments are conducted to compare the proposed PDNN-based SPSS system with the standard DNN methods. The multi-task learning (MTL) method is also applied to the structure of PDNN and DNN as the contrast experiment of the transfer learning. The computational complexity, prediction sequences and quantity of hierarchies of the PDNN are investigated. Both objective and subjective experimental results demonstrate the effectiveness of the proposed technique.
引用
收藏
页码:907 / 911
页数:5
相关论文
共 50 条
  • [21] NEURAL SOURCE-FILTER-BASED WAVEFORM MODEL FOR STATISTICAL PARAMETRIC SPEECH SYNTHESIS
    Wang, Xin
    Takaki, Shinji
    Yamagishi, Junichi
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 5916 - 5920
  • [22] A Kullback-Leibler Divergence Based Recurrent Mixture Density Network for Acoustic Modeling in Emotional Statistical Parametric Speech Synthesis
    An, Xiaochun
    Zhang, Yuchao
    Liu, Bing
    Xue, Liumeng
    Xie, Lei
    PROCEEDINGS OF THE JOINT WORKSHOP OF THE 4TH WORKSHOP ON AFFECTIVE SOCIAL MULTIMEDIA COMPUTING AND FIRST MULTI-MODAL AFFECTIVE COMPUTING OF LARGE-SCALE MULTIMEDIA DATA (ASMMC-MMAC'18), 2018, : 1 - 6
  • [23] Multi-Language Multi-Speaker Acoustic Modeling for LSTM-RNN based Statistical Parametric Speech Synthesis
    Li, Bo
    Zen, Heiga
    17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 2468 - 2472
  • [24] Fundamental frequency modeling for speech synthesis based on a statistical learning technique
    Sakai, S
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2005, E88D (03): : 489 - 495
  • [25] Statistical parametric speech synthesis
    Black, Alan W.
    Zen, Heiga
    Tokuda, Keiichi
    2007 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL IV, PTS 1-3, 2007, : 1229 - +
  • [26] Statistical parametric speech synthesis
    Zen, Heiga
    Tokuda, Keiichi
    Black, Alan W.
    SPEECH COMMUNICATION, 2009, 51 (11) : 1039 - 1064
  • [27] Modeling Irregular Voice in Statistical Parametric Speech Synthesis With Residual Codebook Based Excitation
    Csapo, Tamas Gabor
    Nemeth, Geza
    IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2014, 8 (02) : 209 - 220
  • [28] Continuous F0 Modeling for HMM Based Statistical Parametric Speech Synthesis
    Yu, Kai
    Young, Steve
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2011, 19 (05): : 1071 - 1079
  • [29] Multidimensional Residual Learning Based on Recurrent Neural Networks for Acoustic Modeling
    Zhao, Yuanyuan
    Xu, Shuang
    Xu, Bo
    17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 3419 - 3423
  • [30] Neural learning for articulatory speech synthesis under different statistical characteristics of acoustic input patterns
    Altun, H
    Curtis, KM
    Yalcinoz, T
    COMPUTERS & ELECTRICAL ENGINEERING, 2003, 29 (06) : 687 - 702