Speech synthesis of Shanghai dialect based on DNN and LSTM-RNN

被引:0
|
作者
You, Yuren [1 ]
Zhou, Yun [1 ]
Yang, Hongwu [1 ]
Wang, Hui [1 ]
Chen, Lijia [1 ]
机构
[1] Northwest Normal Univ, Coll Phys & Elect Engn, Lanzhou, Gansu, Peoples R China
基金
中国国家自然科学基金;
关键词
Shanghai dialect; Speech synthesis; Deep neural network; Long short-term memory network;
D O I
10.1109/ISPA-BDCloud-SustainCom-SocialCom48970.2019.00188
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
This paper proposes a speech synthesis method in Shanghai dialect based on deep learning. We firstly build a Shanghai dialect speech corpus for model training. At the same time, we realize a text analyzer for obtaining context-dependent information of Shanghai dialect from Chinese sentence. Finally, we adopt both deep neural networks (DNN)-based method and long short term memory networks-recurrent neural networks (LSTM-RNN) to realize the speech synthesis of Shanghai dialect. Subjective and objective experimental results show that the proposed method can synthesize the Shanghai dialect speech with better voice quality. The speeches synthesized by the LSTM-RNN-based method have better voice quality than that of the DNN-based method.
引用
收藏
页码:1309 / 1315
页数:7
相关论文
共 50 条
  • [1] MULTIPLE-TARGET DEEP LEARNING FOR LSTM-RNN BASED SPEECH ENHANCEMENT
    Sun, Lei
    Du, Jun
    Dai, Li-Rong
    Lee, Chin-Hui
    2017 HANDS-FREE SPEECH COMMUNICATIONS AND MICROPHONE ARRAYS (HSCMA 2017), 2017, : 136 - 140
  • [2] Bengali Speech Recognition: A Double Layered LSTM-RNN Approach
    Nahid, Md Mahadi Hasan
    Purkaystha, Bishwajit
    Islam, Md Saiful
    2017 20TH INTERNATIONAL CONFERENCE OF COMPUTER AND INFORMATION TECHNOLOGY (ICCIT), 2017,
  • [3] Speech-Driven Facial Animation by LSTM-RNN for Communication Use
    Nishimura, Ryosuke
    Sakata, Nobuchika
    Tominaga, Tomu
    Hijikata, Yoshinori
    Harada, Kensuke
    Kiyokawa, Kiyoshi
    2019 26TH IEEE CONFERENCE ON VIRTUAL REALITY AND 3D USER INTERFACES (VR), 2019, : 1102 - 1103
  • [4] NLP-QA framework based on LSTM-RNN
    Zhang, Xiao
    Chen, Meng Hui
    Qin, Yao
    2ND INTERNATIONAL CONFERENCE ON DATA SCIENCE AND BUSINESS ANALYTICS (ICDSBA 2018), 2018, : 307 - 311
  • [5] Speech-Driven Facial Animation by LSTM-RNN for Communication Use
    Nishimura, Ryosuke
    Sakata, Nobuchika
    Tominaga, Tomu
    Hijikata, Yoshinori
    Harada, Kensuke
    Kiyokawa, Kiyoshi
    2019 12TH ASIA PACIFIC WORKSHOP ON MIXED AND AUGMENTED REALITY (APMAR), 2019, : 22 - 29
  • [6] Multi-Language Multi-Speaker Acoustic Modeling for LSTM-RNN based Statistical Parametric Speech Synthesis
    Li, Bo
    Zen, Heiga
    17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 2468 - 2472
  • [7] Throat Microphone Speech Enhancement via Progressive Learning of Spectral Mapping Based on LSTM-RNN
    Zheng, Changyan
    Zhang, Xiongwei
    Sun, Meng
    Xing, Yibo
    Shi, Huawen
    2018 IEEE 18TH INTERNATIONAL CONFERENCE ON COMMUNICATION TECHNOLOGY (ICCT), 2018, : 1002 - 1006
  • [8] Lstm-Rnn Based Approach for Prediction of Dengue Cases in India
    Doni A.R.
    Sasipraba T.
    Ingenierie des Systemes d'Information, 2020, 25 (03): : 327 - 3355
  • [9] Text-independent speaker recognition using LSTM-RNN and speech enhancement
    Abd El-Moneim, Samia
    Nassar, M. A.
    Dessouky, Moawad I.
    Ismail, Nabil A.
    El-Fishawy, Adel S.
    Abd El-Samie, Fathi E.
    MULTIMEDIA TOOLS AND APPLICATIONS, 2020, 79 (33-34) : 24013 - 24028
  • [10] Text-independent speaker recognition using LSTM-RNN and speech enhancement
    Samia Abd El-Moneim
    M. A. Nassar
    Moawad I. Dessouky
    Nabil A. Ismail
    Adel S. El-Fishawy
    Fathi E. Abd El-Samie
    Multimedia Tools and Applications, 2020, 79 : 24013 - 24028