Transfer Learning based Progressive Neural Networks for Acoustic Modeling in Statistical Parametric Speech Synthesis

被引:2
|
作者
Fu, Ruibo [1 ,2 ]
Tao, Jianhua [1 ,2 ,3 ]
Zheng, Yibin [1 ,2 ]
Wen, Zhengqi [1 ]
机构
[1] Chinese Acad Sci, Inst Automat, Natl Lab Pattern Recognit, Beijing, Peoples R China
[2] Univ Chinese Acad Sci, Sch Artificial Intelligence, Beijing, Peoples R China
[3] CAS Ctr Excellence Brain Sci & Intelligence Techn, Beijing, Peoples R China
基金
中国国家自然科学基金;
关键词
speech synthesis; progressive neural networks; acoustic modeling; transfer learning;
D O I
10.21437/Interspeech.2018-1265
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The fundamental frequency and the spectrum parameters of the speech are correlated thus one of their learned mapping from the linguistic features can be leveraged to help determine the other. The conventional methods treated all the acoustic features as one stream for acoustic modeling. And the multi-task learning methods were applied to acoustic modeling with several targets in a global cost function. To improve the accuracy of the acoustic model, the progressive deep neural networks (PDNN) is applied for acoustic modeling in statistical parametric speech synthesis (SPSS) in our method. Each type of the acoustic features is modeled in different sub-networks with its own cost function and the knowledge transfers through lateral connections. Each sub-network in the PDNN can be trained step by step to reach its own optimum. Experiments are conducted to compare the proposed PDNN-based SPSS system with the standard DNN methods. The multi-task learning (MTL) method is also applied to the structure of PDNN and DNN as the contrast experiment of the transfer learning. The computational complexity, prediction sequences and quantity of hierarchies of the PDNN are investigated. Both objective and subjective experimental results demonstrate the effectiveness of the proposed technique.
引用
收藏
页码:907 / 911
页数:5
相关论文
共 50 条
  • [31] STATISTICAL PARAMETRIC SPEECH SYNTHESIS BASED ON PRODUCT OF EXPERTS
    Zen, Heiga
    Gales, Mark J. F.
    Nankaku, Yoshihiko
    Tokuda, Keiichi
    2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, : 4242 - 4245
  • [32] UFANS: U-Shaped Fully-Parallel Acoustic Neural Structure for Statistical Parametric Speech Synthesis
    Ma, Dabiao
    Su, Zhiba
    Wang, Wenxuan
    Lu, Yuhao
    Li, Zhen
    PRICAI 2019: TRENDS IN ARTIFICIAL INTELLIGENCE, PT III, 2019, 11672 : 273 - 278
  • [33] Progressive Neural Networks for Transfer Learning in Emotion Recognition
    Gideon, John
    Khorram, Soheil
    Aldeneh, Zakaria
    Dimitriadis, Dimitrios
    Provost, Emily Mower
    18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 1098 - 1102
  • [34] An Investigation of Recurrent Neural Network Architectures for Statistical Parametric Speech Synthesis
    Achanta, Sivanand
    Godambe, Tejas
    Gangashetty, Suryakanth V.
    16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 859 - 863
  • [35] Modeling Unvoiced Sounds In Statistical Parametric Speech Synthesis with a Continuous Vocoder
    Csapo, Tamas Gabor
    Nemeth, Geza
    Cernak, Milos
    Garner, Philip N.
    2016 24TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2016, : 1338 - 1342
  • [36] Transfer learning for acoustic modeling of noise robust speech recognition
    Yi J.
    Tao J.
    Liu B.
    Wen Z.
    Qinghua Daxue Xuebao/Journal of Tsinghua University, 2018, 58 (01): : 55 - 60
  • [37] Using text and acoustic features in predicting glottal excitation waveforms for parametric speech synthesis with recurrent neural networks
    Juvela, Lauri
    Wang, Xin
    Takaki, Shinji
    Airaksinen, Manu
    Yamagishi, Junichi
    Alku, Paavo
    17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 2283 - 2287
  • [38] Statistical parametric speech synthesis for Ibibio
    Ekpenyong, Moses
    Urua, Eno-Abasi
    Watts, Oliver
    King, Simon
    Yamagishi, Junichi
    SPEECH COMMUNICATION, 2014, 56 : 243 - 251
  • [39] An introduction to statistical parametric speech synthesis
    King, Simon
    SADHANA-ACADEMY PROCEEDINGS IN ENGINEERING SCIENCES, 2011, 36 (05): : 837 - 852
  • [40] An introduction to statistical parametric speech synthesis
    Simon King
    Sadhana, 2011, 36 : 837 - 852