A study of speaker adaptation for DNN-based speech synthesis

被引:0
|
作者
Wu, Zhizheng [1 ]
Swietojanski, Pawel [1 ]
Veaux, Christophe [1 ]
Renals, Steve [1 ]
King, Simon [1 ]
机构
[1] Univ Edinburgh, Ctr Speech Technol Res, Edinburgh EH8 9YL, Midlothian, Scotland
关键词
Speech synthesis; acoustic model; deep neural network; speaker adaptation;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
A major advantage of statistical parametric speech synthesis (SPSS) over unit-selection speech synthesis is its adaptability and controllability in changing speaker characteristics and speaking style. Recently, several studies using deep neural networks (DNNs) as acoustic models for SPSS have shown promising results. However, the adaptability of DNNs in SPSS has not been systematically studied. In this paper, we conduct an experimental analysis of speaker adaptation for DNN-based speech synthesis at different levels. In particular, we augment a low-dimensional speaker-specific vector with linguistic features as input to represent speaker identity, perform model adaptation to scale the hidden activation weights, and perform a feature space transformation at the output layer to modify generated acoustic features. We systematically analyse the performance of each individual adaptation technique and that of their combinations. Experimental results confirm the adaptability of the DNN, and listening tests demonstrate that the DNN can achieve significantly better adaptation performance than the hidden Markov model (HMM) baseline in terms of naturalness and speaker similarity.
引用
收藏
页码:879 / 883
页数:5
相关论文
共 50 条
  • [1] A DNN-based emotional speech synthesis by speaker adaptation
    Yang, Hongwu
    Zhang, Weizhao
    Zhi, Pengpeng
    [J]. 2018 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2018, : 633 - 637
  • [2] Speaker adaptation in DNN-based speech synthesis using d-vectors
    Doddipatla, Rama
    Braunschweiler, Norbert
    Maia, Ranniery
    [J]. 18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 3404 - 3408
  • [3] Unsupervised Speaker Adaptation for DNN-based Speech Synthesis using Input Codes
    Takaki, Shinji
    Nishimura, Yoshikazu
    Yamagishi, Junichi
    [J]. 2018 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2018, : 649 - 658
  • [4] UNSUPERVISED SPEAKER ADAPTATION FOR DNN-BASED TTS SYNTHESIS
    Fan, Yuchen
    Qian, Yao
    Soong, Frank K.
    He, Lei
    [J]. 2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 5135 - 5139
  • [5] DNN-Based Speech Synthesis Using Speaker Codes
    Hojo, Nobukatsu
    Ijima, Yusuke
    Mizuno, Hideyuki
    [J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2018, E101D (02): : 462 - 472
  • [6] An Investigation of DNN-Based Speech Synthesis Using Speaker Codes
    Hojo, Nobukatsu
    Ijima, Yusuke
    Mizuno, Hideyuki
    [J]. 17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 2278 - 2282
  • [7] MULTI-SPEAKER MODELING AND SPEAKER ADAPTATION FOR DNN-BASED TTS SYNTHESIS
    Fan, Yuchen
    Qian, Yao
    Soong, Frank K.
    He, Lei
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 4475 - 4479
  • [8] DNN-BASED SPEAKER-ADAPTIVE POSTFILTERING WITH LIMITED ADAPTATION DATA FOR STATISTICAL SPEECH SYNTHESIS SYSTEMS
    Ozturk, Mirac Goksu
    Ulusoy, Okan
    Demiroglu, Cenk
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 7030 - 7034
  • [9] DNN-Based Arabic Speech Synthesis
    Amrouche, Aissa
    Bentrcia, Youssouf
    Boubakeur, Khadidja Nesrine
    Abed, Ahcene
    [J]. 2022 9TH INTERNATIONAL CONFERENCE ON ELECTRICAL AND ELECTRONICS ENGINEERING (ICEEE 2022), 2022, : 378 - 382
  • [10] SPEAKER AND LANGUAGE FACTORIZATION IN DNN-BASED TTS SYNTHESIS
    Fan, Yuchen
    Qian, Yao
    Soong, Frank K.
    He, Lei
    [J]. 2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 5540 - 5544