Low-Resource Speech Synthesis with Speaker-Aware Embedding

被引:2
|
作者
Yang, Li-Jen [1 ]
Yeh, I-Ping [2 ]
Chien, Jen-Tzung [1 ]
机构
[1] Natl Yang Ming Chiao Tung Univ, Inst Elect & Comp Engn, Hsinchu, Taiwan
[2] Natl Yang Ming Chiao Tung Univ, Grad Degree Program Cybersecur, Hsinchu, Taiwan
关键词
low-resource speech synthesis; speaker-aware embedding; encoder-decoder model; transformer; NETWORKS;
D O I
10.1109/ISCSLP57327.2022.10038221
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Speech synthesis has been successfully exploited for mapping from text sequence to speech waveform where high-resource languages have been well studied and learned from a large amount of text-speech paired data in public-domain corpora. However, developing speech synthesis under low-resource languages is challenging for speech communication in local regions since the collection of training data is expensive. In particular, the speaker-aware speech generation under low-resource settings is crucial in real world. Such a problem is increasingly difficult in case of very limited speaker-specific data. This paper presents a speaker-aware speech synthesis under low-resource settings based on an encoder-decoder framework by using transformer. Knowledge transfer is performed by incorporating a speaker-aware embedding through first learning a pretrained transformer from multi-speaker data of a low-populated spoken language and then fine-tuning the transformer to a target speaker with very limited speaker-specific embeddings. Experiments on low-resource Taiwanese speech synthesis are evaluated to show the merit of speaker-aware transformer in terms of Mel cepstral distortion and mean opinion score.
引用
收藏
页码:235 / 239
页数:5
相关论文
共 50 条
  • [41] Linguistic Foundations of Low-Resource Languages for Speech Synthesis on the Example of the Kazakh Language
    Bekmanova, Gulmira
    Yergesh, Banu
    Sharipbay, Altynbek
    Omarbekova, Assel
    Zakirova, Alma
    COMPUTATIONAL SCIENCE AND ITS APPLICATIONS - ICCSA 2022 WORKSHOPS, PART III, 2022, 13379 : 3 - 14
  • [42] Efficient neural speech synthesis for low-resource languages through multilingual modeling
    de Korte, Marcel
    Kim, Jaebok
    Klabbers, Esther
    INTERSPEECH 2020, 2020, : 2967 - 2971
  • [43] Speaker-Aware Interactive Graph Attention Network for Emotion Recognition in Conversation
    Jia, Zhaohong
    Shi, Yunwei
    Liu, Weifeng
    Huang, Zhenhua
    Sun, Xiao
    ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2023, 22 (12)
  • [44] Low-Resource Autodiacritization of Abjads for Speech Keyword Search
    Schone, Patrick
    INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 741 - 744
  • [45] DEEP MAXOUT NETWORKS FOR LOW-RESOURCE SPEECH RECOGNITION
    Miao, Yajie
    Metze, Florian
    Rawat, Shourabh
    2013 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING (ASRU), 2013, : 398 - 403
  • [46] Frontier Research on Low-Resource Speech Recognition Technology
    Slam, Wushour
    Li, Yanan
    Urouvas, Nurmamet
    SENSORS, 2023, 23 (22)
  • [47] Optimizing Data Usage for Low-Resource Speech Recognition
    Qian, Yanmin
    Zhou, Zhikai
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2022, 30 : 394 - 403
  • [48] ADVERSARIAL MULTILINGUAL TRAINING FOR LOW-RESOURCE SPEECH RECOGNITION
    Yi, Jiangyan
    Tao, Jianhua
    Wen, Zhengqi
    Bai, Ye
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 4899 - 4903
  • [49] Speech recognition datasets for low-resource Congolese languages
    Kimanuka, Ussen
    Maina, Ciira wa
    Buyuk, Osman
    DATA IN BRIEF, 2024, 52
  • [50] AUTOMATIC RATING OF SPONTANEOUS SPEECH FOR LOW-RESOURCE LANGUAGES
    Al-Ghezi, Ragheb
    Getman, Yaroslav
    Voskoboinik, Ekaterina
    Singh, Mittul
    Kurimo, Mikko
    2022 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, SLT, 2022, : 339 - 345