DEEP NEURAL NETWORKS EMPLOYING MULTI-TASK LEARNING AND STACKED BOTTLENECK FEATURES FOR SPEECH SYNTHESIS

被引:0
|
作者
Wu, Zhizheng [1 ]
Valentini-Botinhao, Cassia [1 ]
Watts, Oliver [1 ]
King, Simon [1 ]
机构
[1] Univ Edinburgh, Ctr Speech Technol Res, Edinburgh, Midlothian, Scotland
基金
英国工程与自然科学研究理事会;
关键词
Speech synthesis; acoustic model; multi-task learning; deep neural network; bottleneck feature; HMM;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Deep neural networks (DNNs) use a cascade of hidden representations to enable the learning of complex mappings from input to output features. They are able to learn the complex mapping from text-based linguistic features to speech acoustic features, and so perform text-to-speech synthesis. Recent results suggest that DNNs can produce more natural synthetic speech than conventional HMM-based statistical parametric systems. In this paper, we show that the hidden representation used within a DNN can be improved through the use of Multi-Task Learning, and that stacking multiple frames of hidden layer activations (stacked bottleneck features) also leads to improvements. Experimental results confirmed the effectiveness of the proposed methods, and in listening tests we find that stacked bottleneck features in particular offer a significant improvement over both a baseline DNN and a benchmark HMM system.
引用
收藏
页码:4460 / 4464
页数:5
相关论文
共 50 条
  • [1] Multi-task Learning Deep Neural Networks For Speech Feature Denoising
    Huang, Bin
    Ke, Dengfeng
    Zheng, Hao
    Xu, Bo
    Xu, Yanyan
    Su, Kaile
    [J]. 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 2464 - 2468
  • [2] Adversarial Multi-task Learning of Deep Neural Networks for Robust Speech Recognition
    Shinohara, Yusuke
    [J]. 17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 2369 - 2372
  • [3] MULTI-TASK JOINT-LEARNING OF DEEP NEURAL NETWORKS FOR ROBUST SPEECH RECOGNITION
    Qian, Yanmin
    Yin, Maofan
    You, Yongbin
    Yu, Kai
    [J]. 2015 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING (ASRU), 2015, : 310 - 316
  • [4] Efficient deep neural networks for speech synthesis using bottleneck features
    Joo, Young-Sun
    Jun, Won-Suk
    Kang, Hong-Goo
    [J]. 2016 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA), 2016,
  • [5] Evolving Deep Parallel Neural Networks for Multi-Task Learning
    Wu, Jie
    Sun, Yanan
    [J]. ALGORITHMS AND ARCHITECTURES FOR PARALLEL PROCESSING, ICA3PP 2021, PT II, 2022, 13156 : 517 - 531
  • [6] Multi-Adaptive Optimization for multi-task learning with deep neural networks
    Hervella, alvaro S.
    Rouco, Jose
    Novo, Jorge
    Ortega, Marcos
    [J]. NEURAL NETWORKS, 2024, 170 : 254 - 265
  • [7] Deep Convolutional Neural Networks for Multi-Instance Multi-Task Learning
    Zeng, Tao
    Ji, Shuiwang
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM), 2015, : 579 - 588
  • [8] Cell tracking using deep neural networks with multi-task learning
    He, Tao
    Mao, Hua
    Guo, Jixiang
    Yi, Zhang
    [J]. IMAGE AND VISION COMPUTING, 2017, 60 : 142 - 153
  • [9] MULTI-TASK LEARNING IN DEEP NEURAL NETWORKS FOR IMPROVED PHONEME RECOGNITION
    Seltzer, Michael L.
    Droppo, Jasha
    [J]. 2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 6965 - 6969
  • [10] Rapid Adaptation for Deep Neural Networks through Multi-Task Learning
    Huang, Zhen
    Li, Jinyu
    Siniscalchi, Sabato Marco
    Chen, I-Fan
    Wu, Ji
    Lee, Chin-Hui
    [J]. 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 3625 - 3629