SCALING AND BIAS CODES FOR MODELING SPEAKER-ADAPTIVE DNN-BASED SPEECH SYNTHESIS SYSTEMS

被引:0
|
作者
Hieu-Thi Luong [1 ]
Yamagishi, Junichi [1 ,2 ]
机构
[1] Natl Inst Informat, Tokyo, Japan
[2] Univ Edinburgh, Edinburgh, Midlothian, Scotland
关键词
speech synthesis; speaker adaptation; neural network; factorization; speaker code; ADAPTATION;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Most neural-network based speaker-adaptive acoustic models for speech synthesis can be categorized into either layer-based or input-code approaches. Although both approaches have their own pros and cons, most existing works on speaker adaptation focus on improving one or the other. In this paper, after we first systematically overview the common principles of neural-network based speaker-adaptive models, we show that these approaches can be represented in a unified framework and can be generalized further. More specifically, we introduce the use of scaling and bias codes as generalized means for speaker-adaptive transformation. By utilizing these codes, we can create a more efficient factorized speaker-adaptive model and capture advantages of both approaches while reducing their disadvantages. The experiments show that the proposed method can improve the performance of speaker adaptation compared with speaker adaptation based on the conventional input code.
引用
收藏
页码:610 / 617
页数:8
相关论文
共 50 条
  • [41] Pre-Training of DNN-Based Speech Synthesis Based on Bidirectional Conversion between Text and Speech
    Sone, Kentaro
    Nakashika, Toru
    [J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2019, E102D (08) : 1546 - 1553
  • [42] A DNN-based Mandarin-Tibetan cross-lingual speech synthesis
    Guo, Weitong
    Yang, Hongwu
    Gan, Zhenye
    [J]. 2018 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2018, : 1702 - 1707
  • [43] Investigating Effective Additional Contextual Factors in DNN-based Spontaneous Speech Synthesis
    Yamashita, Yuki
    Koriyama, Tomoki
    Saito, Yuki
    Takamichi, Shinnosuke
    Ijima, Yusuke
    Masumura, Ryo
    Saruwatari, Hiroshi
    [J]. INTERSPEECH 2020, 2020, : 3201 - 3205
  • [44] Uncertainty decoding with adaptive sampling for noise robust DNN-based acoustic modeling
    Tran, Dung T.
    Delcroix, Marc
    Ogawa, Atsunori
    Nakatani, Tomohiro
    [J]. 18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 3852 - 3856
  • [45] LSTM-Based Robust Voicing Decision Applied to DNN-Based Speech Synthesis
    Pradeep, R.
    Reddy, M. Kiran
    Rao, K. Sreenivasa
    [J]. AUTOMATIC CONTROL AND COMPUTER SCIENCES, 2019, 53 (04) : 328 - 332
  • [46] LSTM-Based Robust Voicing Decision Applied to DNN-Based Speech Synthesis
    R. Pradeep
    M. Kiran Reddy
    K. Sreenivasa Rao
    [J]. Automatic Control and Computer Sciences, 2019, 53 : 328 - 332
  • [47] GRAPH-BASED SEMI-SUPERVISED ACOUSTIC MODELING IN DNN-BASED SPEECH RECOGNITION
    Liu, Yuzong
    Kirchhoff, Katrin
    [J]. 2014 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY SLT 2014, 2014, : 177 - 182
  • [48] DNN-based Speech Synthesis for Small Data Sets Considering Bidirectional Speech-Text Conversion
    Sone, Kentaro
    Nakashika, Toru
    [J]. 19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 2519 - 2523
  • [49] Adaptive DNN-based CSI Feedback with Quantization for FDD Massive MIMO Systems
    Gao, Junjie
    Bouazizi, Mondher
    Ohtsuki, Tomoaki
    Gui, Guan
    [J]. 2022 IEEE 96TH VEHICULAR TECHNOLOGY CONFERENCE (VTC2022-FALL), 2022,
  • [50] TRAINING ALGORITHM TO DECEIVE ANTI-SPOOFING VERIFICATION FOR DNN-BASED SPEECH SYNTHESIS
    Saito, Yuki
    Takamichi, Shinnosuke
    Saruwatari, Hiroshi
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 4900 - 4904