Low-Resource Speech Synthesis with Speaker-Aware Embedding

被引:2
|
作者
Yang, Li-Jen [1 ]
Yeh, I-Ping [2 ]
Chien, Jen-Tzung [1 ]
机构
[1] Natl Yang Ming Chiao Tung Univ, Inst Elect & Comp Engn, Hsinchu, Taiwan
[2] Natl Yang Ming Chiao Tung Univ, Grad Degree Program Cybersecur, Hsinchu, Taiwan
关键词
low-resource speech synthesis; speaker-aware embedding; encoder-decoder model; transformer; NETWORKS;
D O I
10.1109/ISCSLP57327.2022.10038221
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Speech synthesis has been successfully exploited for mapping from text sequence to speech waveform where high-resource languages have been well studied and learned from a large amount of text-speech paired data in public-domain corpora. However, developing speech synthesis under low-resource languages is challenging for speech communication in local regions since the collection of training data is expensive. In particular, the speaker-aware speech generation under low-resource settings is crucial in real world. Such a problem is increasingly difficult in case of very limited speaker-specific data. This paper presents a speaker-aware speech synthesis under low-resource settings based on an encoder-decoder framework by using transformer. Knowledge transfer is performed by incorporating a speaker-aware embedding through first learning a pretrained transformer from multi-speaker data of a low-populated spoken language and then fine-tuning the transformer to a target speaker with very limited speaker-specific embeddings. Experiments on low-resource Taiwanese speech synthesis are evaluated to show the merit of speaker-aware transformer in terms of Mel cepstral distortion and mean opinion score.
引用
收藏
页码:235 / 239
页数:5
相关论文
共 50 条
  • [1] SPEAKER-AWARE SPEECH-TRANSFORMER
    Fan, Zhiyun
    Li, Jie
    Zhou, Shiyu
    Xu, Bo
    2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019), 2019, : 222 - 229
  • [2] Speaker-Aware Monaural Speech Separation
    Xu, Jiahao
    Hu, Kun
    Xu, Chang
    Duc Chung Tran
    Wang, Zhiyong
    INTERSPEECH 2020, 2020, : 1451 - 1455
  • [3] SPEAKER-AWARE TARGET SPEAKER ENHANCEMENT BY JOINTLY LEARNING WITH SPEAKER EMBEDDING EXTRACTION
    Ji, Xuan
    Yu, Meng
    Zhang, Chunlei
    Su, Dan
    Yu, Tao
    Liu, Xiaoyu
    Yu, Dong
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 7294 - 7298
  • [4] Speaker-Aware Speech Enhancement with Self-Attention
    Lin, Ju
    Van Wijngaarden, Adriaan J.
    Smith, Melissa C.
    Wang, Kuang-Ching
    29TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO 2021), 2021, : 486 - 490
  • [5] Speaker-aware neural network based beamformer for speaker extraction in speech mixtures
    Zmplikova, Katerina
    Delcroix, Marc
    Kinoshita, Keisuke
    Higuchi, Takuya
    Ogawa, Atsunori
    Nakatani, Tomohiro
    18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 2655 - 2659
  • [6] Speaker-aware Deep Denoising Autoencoder with Embedded Speaker Identity for Speech Enhancement
    Chuang, Fu-Kai
    Wang, Syu-Siang
    Hung, Jeih-weih
    Tsao, Yu
    Fang, Shih-Hau
    INTERSPEECH 2019, 2019, : 3173 - 3177
  • [7] OPTIMIZATION OF SPEAKER-AWARE MULTICHANNEL SPEECH EXTRACTION WITH ASR CRITERION
    Zmolikova, Katerina
    Delcroix, Marc
    Kinoshita, Keisuke
    Higuchi, Takuya
    Nakatani, Tomohiro
    Cernocky, Jan
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 6702 - 6706
  • [8] Speaker-Aware Speech Emotion Recognition by Fusing Amplitude and Phase Information
    Guo, Lili
    Wang, Longbiao
    Dang, Jianwu
    Liu, Zhilei
    Guan, Haotian
    MULTIMEDIA MODELING (MMM 2020), PT I, 2020, 11961 : 14 - 25
  • [9] Speaker-Aware Multi-Task Learning for Automatic Speech Recognition
    Pironkov, Gueorgui
    Dupont, Stephane
    Dutoit, Thierry
    2016 23RD INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2016, : 2900 - 2905
  • [10] Speaker-Aware Linear Discriminant Analysis in Speaker Verification
    Zheng, Naijun
    Wu, Xixin
    Zhong, Jinghua
    Liu, Xunying
    Meng, Helen
    INTERSPEECH 2020, 2020, : 3012 - 3016