ADAPTATION OF AN EXPRESSIVE SINGLE SPEAKER DEEP NEURAL NETWORK SPEECH SYNTHESIS SYSTEM

被引:0
|
作者
Parker, Jonathan [1 ,2 ]
Stylianou, Yannis [2 ]
Cipolla, Roberto [1 ,2 ]
机构
[1] Univ Cambridge, Dept Engn, Cambridge, England
[2] Toshiba Res Europe Ltd, Cambridge Res Lab, Cambridge, England
关键词
DNN; expressive speech; expressive speaker adaptation; expression transplantation;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
One of the advantages of statistical parametric speech synthesis is the ability to alter some of the characteristics of the speech e.g. change the speaker, expression etc. In this paper we present a technique to adapt an expressive single speaker deep neural network (DNN) speech synthesis model to a new speaker, allowing for both neutral and expressive speech in the new speaker's voice. Experiments show that the proposed adaptation technique achieves higher MOS scores on both neutral and expressive speech, and higher speaker similarity and slightly lower expression similarity scores on the expressive speech when compared with another DNN speaker adaptation technique.
引用
收藏
页码:5309 / 5313
页数:5
相关论文
共 50 条
  • [1] A Deep Neural Network Speaker Verification System Targeting Microphone Speech
    Lei, Yun
    Ferrer, Luciana
    McLaren, Mitchell
    Scheffer, Nicolas
    [J]. 15TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2014), VOLS 1-4, 2014, : 681 - 685
  • [2] UNSUPERVISED SPEAKER ADAPTATION OF DEEP NEURAL NETWORK BASED ON THE COMBINATION OF SPEAKER CODES AND SINGULAR VALUE DECOMPOSITION FOR SPEECH RECOGNITION
    Xue, Shaofei
    Jiang, Hui
    Dai, Lirong
    Liu, Qingfeng
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 4555 - 4559
  • [3] Discriminative Learning of Filterbank Layer within Deep Neural Network Based Speech Recognition for Speaker Adaptation
    Seki, Hiroshi
    Yamamoto, Kazumasa
    Akiba, Tomoyosi
    Nakagawa, Seiichi
    [J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2019, E102D (02) : 364 - 374
  • [4] Ensemble Speaker Modeling using Speaker Adaptive Training Deep Neural Network for Speaker Adaptation
    Li, Sheng
    Lu, Xugang
    Akita, Yuya
    Kawahara, Tatsuya
    [J]. 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 2892 - 2896
  • [5] Frequency Offset Correction in Single Sideband(SSB) Speech by Deep Neural Network for Speaker Verification
    Xing, Hua
    Liu, Gang
    Hansen, John H. L.
    [J]. 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 1156 - 1160
  • [6] EXPRESSIVE VISUAL TEXT TO SPEECH AND EXPRESSION ADAPTATION USING DEEP NEURAL NETWORKS
    Parker, Jonathan
    Maia, Ranniery
    Stylianou, Yannis
    Cipolla, Roberto
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 4920 - 4924
  • [7] A Comparison of Expressive Speech Synthesis Approaches based on Neural Network
    Xue, Liumeng
    Zhu, Xiaolian
    An, Xiaochun
    Xie, Lei
    [J]. PROCEEDINGS OF THE JOINT WORKSHOP OF THE 4TH WORKSHOP ON AFFECTIVE SOCIAL MULTIMEDIA COMPUTING AND FIRST MULTI-MODAL AFFECTIVE COMPUTING OF LARGE-SCALE MULTIMEDIA DATA (ASMMC-MMAC'18), 2018, : 15 - 20
  • [8] Comparison of Regularization Constraints in Deep Neural Network based Speaker Adaptation
    Shen, Peng
    Lu, Xugang
    Kawai, Hisashi
    [J]. 2016 10TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2016,
  • [9] USING PERSONALIZED SPEECH SYNTHESIS AND NEURAL LANGUAGE GENERATOR FOR RAPID SPEAKER ADAPTATION
    Huang, Yan
    He, Lei
    Wei, Wenning
    Gale, William
    Li, Jinyu
    Gong, Yifan
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 7399 - 7403
  • [10] Speaker Adaptation for Speech Synthesis Based on Deep Neural Networks Using Hidden Semi-Markov Model Structures
    Nakao, Kento
    Hashimoto, Kei
    Oura, Keiichiro
    Nankaku, Yoshihiko
    Tokuda, Keiichi
    [J]. 2018 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2018, : 638 - 643