DNN-Based Speech Synthesis: Importance of Input Features and Training Data

被引:4
|
作者
Lazaridis, Alexandros [1 ]
Potard, Blaise [1 ]
Garner, Philip N. [1 ]
机构
[1] Idiap Res Inst, Martigny, Switzerland
来源
关键词
Text-to-speech synthesis; Statistical parametric synthesis; Deep neural networks; Hidden markov models;
D O I
10.1007/978-3-319-23132-7_24
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Deep neural networks (DNNs) have been recently introduced in speech synthesis. In this paper, an investigation on the importance of input features and training data on speaker dependent (SD) DNN-based speech synthesis is presented. Various aspects of the training procedure of DNNs are investigated in this work. Additionally, several training sets of different size (i.e., 13.5, 3.6 and 1.5 h of speech) are evaluated.
引用
收藏
页码:193 / 200
页数:8
相关论文
共 50 条
  • [1] SNR-Based Features and Diverse Training Data for Robust DNN-Based Speech Enhancement
    Rehr, Robert
    Gerkmann, Timo
    [J]. IEEE/ACM Transactions on Audio Speech and Language Processing, 2021, 29 : 1937 - 1949
  • [2] SNR-Based Features and Diverse Training Data for Robust DNN-Based Speech Enhancement
    Rehr, Robert
    Gerkmann, Timo
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2021, 29 : 1937 - 1949
  • [3] On the Training of DNN-based Average Voice Model for Speech Synthesis
    Yang, Shan
    Wu, Zhizheng
    Xie, Lei
    [J]. 2016 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA), 2016,
  • [4] ADAPTING AND CONTROLLING DNN-BASED SPEECH SYNTHESIS USING INPUT CODES
    Luong, Hieu-Thi
    Takaki, Shinji
    Hente, Gustav Eje
    Yamagishi, Junichi
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 4905 - 4909
  • [5] DNN-Based Arabic Speech Synthesis
    Amrouche, Aissa
    Bentrcia, Youssouf
    Boubakeur, Khadidja Nesrine
    Abed, Ahcene
    [J]. 2022 9TH INTERNATIONAL CONFERENCE ON ELECTRICAL AND ELECTRONICS ENGINEERING (ICEEE 2022), 2022, : 378 - 382
  • [6] Towards minimum perceptual error training for DNN-based speech synthesis
    Valentini-Botinhao, Cassia
    Wu, Zhizheng
    King, Simon
    [J]. 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 869 - 873
  • [7] Unsupervised Speaker Adaptation for DNN-based Speech Synthesis using Input Codes
    Takaki, Shinji
    Nishimura, Yoshikazu
    Yamagishi, Junichi
    [J]. 2018 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2018, : 649 - 658
  • [8] AN ANALYSIS OF NOISE-AWARE FEATURES IN COMBINATION WITH THE SIZE AND DIVERSITY OF TRAINING DATA FOR DNN-BASED SPEECH ENHANCEMENT
    Rehr, Robert
    Gerkmann, Timo
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 601 - 605
  • [9] Pre-Training of DNN-Based Speech Synthesis Based on Bidirectional Conversion between Text and Speech
    Sone, Kentaro
    Nakashika, Toru
    [J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2019, E102D (08) : 1546 - 1553
  • [10] A DNN-based emotional speech synthesis by speaker adaptation
    Yang, Hongwu
    Zhang, Weizhao
    Zhi, Pengpeng
    [J]. 2018 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2018, : 633 - 637