Transfer Learning based Progressive Neural Networks for Acoustic Modeling in Statistical Parametric Speech Synthesis

被引:2
|
作者
Fu, Ruibo [1 ,2 ]
Tao, Jianhua [1 ,2 ,3 ]
Zheng, Yibin [1 ,2 ]
Wen, Zhengqi [1 ]
机构
[1] Chinese Acad Sci, Inst Automat, Natl Lab Pattern Recognit, Beijing, Peoples R China
[2] Univ Chinese Acad Sci, Sch Artificial Intelligence, Beijing, Peoples R China
[3] CAS Ctr Excellence Brain Sci & Intelligence Techn, Beijing, Peoples R China
基金
中国国家自然科学基金;
关键词
speech synthesis; progressive neural networks; acoustic modeling; transfer learning;
D O I
10.21437/Interspeech.2018-1265
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The fundamental frequency and the spectrum parameters of the speech are correlated thus one of their learned mapping from the linguistic features can be leveraged to help determine the other. The conventional methods treated all the acoustic features as one stream for acoustic modeling. And the multi-task learning methods were applied to acoustic modeling with several targets in a global cost function. To improve the accuracy of the acoustic model, the progressive deep neural networks (PDNN) is applied for acoustic modeling in statistical parametric speech synthesis (SPSS) in our method. Each type of the acoustic features is modeled in different sub-networks with its own cost function and the knowledge transfers through lateral connections. Each sub-network in the PDNN can be trained step by step to reach its own optimum. Experiments are conducted to compare the proposed PDNN-based SPSS system with the standard DNN methods. The multi-task learning (MTL) method is also applied to the structure of PDNN and DNN as the contrast experiment of the transfer learning. The computational complexity, prediction sequences and quantity of hierarchies of the PDNN are investigated. Both objective and subjective experimental results demonstrate the effectiveness of the proposed technique.
引用
收藏
页码:907 / 911
页数:5
相关论文
共 50 条
  • [1] DIRECTLY MODELING SPEECH WAVEFORMS BY NEURAL NETWORKS FOR STATISTICAL PARAMETRIC SPEECH SYNTHESIS
    Tokuda, Keiichi
    Zen, Heiga
    2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 4215 - 4219
  • [2] DEEP MIXTURE DENSITY NETWORKS FOR ACOUSTIC MODELING IN STATISTICAL PARAMETRIC SPEECH SYNTHESIS
    Zen, Heiga
    Senior, Andrew
    2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
  • [3] THE EFFECT OF NEURAL NETWORKS IN STATISTICAL PARAMETRIC SPEECH SYNTHESIS
    Hashimoto, Kei
    Oura, Keiichiro
    Nankaku, Yoshihiko
    Tokuda, Keiichi
    2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 4455 - 4459
  • [4] GATING RECURRENT MIXTURE DENSITY NETWORKS FOR ACOUSTIC MODELING IN STATISTICAL PARAMETRIC SPEECH SYNTHESIS
    Wang, Wenfu
    Xu, Shuang
    Xu, Bo
    2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 5520 - 5524
  • [5] STATISTICAL PARAMETRIC SPEECH SYNTHESIS USING DEEP NEURAL NETWORKS
    Zen, Heiga
    Senior, Andrew
    Schuster, Mike
    2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 7962 - 7966
  • [6] Deep Elman recurrent neural networks for statistical parametric speech synthesis
    Achanta, Sivanand
    Gangashetty, Suryakanth V.
    SPEECH COMMUNICATION, 2017, 93 : 31 - 42
  • [7] SAMPLERNN-BASED NEURAL VOCODER FOR STATISTICAL PARAMETRIC SPEECH SYNTHESIS
    Ai, Yang
    Wu, Hong-Chuan
    Ling, Zhen-Hua
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5659 - 5663
  • [8] Context-dependent acoustic modeling based on hidden maximum entropy model for statistical parametric speech synthesis
    Khorram, Soheil
    Sameti, Hossein
    Bahmaninezhad, Fahimeh
    King, Simon
    Drugman, Thomas
    EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2014,
  • [9] Context-dependent acoustic modeling based on hidden maximum entropy model for statistical parametric speech synthesis
    Soheil Khorram
    Hossein Sameti
    Fahimeh Bahmaninezhad
    Simon King
    Thomas Drugman
    EURASIP Journal on Audio, Speech, and Music Processing, 2014
  • [10] Deep Learning for Acoustic Modeling in Parametric Speech Generation
    Ling, Zhen-Hua
    Kang, Shi-Yin
    Zen, Heiga
    Senior, Andrew
    Schuster, Mike
    Qian, Xiao-Jun
    Meng, Helen
    Deng, Li
    IEEE SIGNAL PROCESSING MAGAZINE, 2015, 32 (03) : 35 - 52