Low-Resource Speech Synthesis with Speaker-Aware Embedding

被引:2
|
作者
Yang, Li-Jen [1 ]
Yeh, I-Ping [2 ]
Chien, Jen-Tzung [1 ]
机构
[1] Natl Yang Ming Chiao Tung Univ, Inst Elect & Comp Engn, Hsinchu, Taiwan
[2] Natl Yang Ming Chiao Tung Univ, Grad Degree Program Cybersecur, Hsinchu, Taiwan
关键词
low-resource speech synthesis; speaker-aware embedding; encoder-decoder model; transformer; NETWORKS;
D O I
10.1109/ISCSLP57327.2022.10038221
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Speech synthesis has been successfully exploited for mapping from text sequence to speech waveform where high-resource languages have been well studied and learned from a large amount of text-speech paired data in public-domain corpora. However, developing speech synthesis under low-resource languages is challenging for speech communication in local regions since the collection of training data is expensive. In particular, the speaker-aware speech generation under low-resource settings is crucial in real world. Such a problem is increasingly difficult in case of very limited speaker-specific data. This paper presents a speaker-aware speech synthesis under low-resource settings based on an encoder-decoder framework by using transformer. Knowledge transfer is performed by incorporating a speaker-aware embedding through first learning a pretrained transformer from multi-speaker data of a low-populated spoken language and then fine-tuning the transformer to a target speaker with very limited speaker-specific embeddings. Experiments on low-resource Taiwanese speech synthesis are evaluated to show the merit of speaker-aware transformer in terms of Mel cepstral distortion and mean opinion score.
引用
收藏
页码:235 / 239
页数:5
相关论文
共 50 条
  • [21] USING SPEECH ENHANCEMENT TO REALIZE SPEECH SYNTHESIS OF LOW-RESOURCE DUNGAN LANGUAGES
    Jiang, Rui
    Chen, Chengsi
    Shan, Xin
    Yang, Hongwu
    2021 24TH CONFERENCE OF THE ORIENTAL COCOSDA INTERNATIONAL COMMITTEE FOR THE CO-ORDINATION AND STANDARDISATION OF SPEECH DATABASES AND ASSESSMENT TECHNIQUES (O-COCOSDA), 2021, : 193 - 198
  • [22] Review of Speech Synthesis Methods Under Low-Resource Condition
    Jialin, Zhang
    Wushouer, Mairidan
    Tuerhong, Gulanbaier
    Computer Engineering and Applications, 2023, 59 (15): : 1 - 16
  • [23] Requirements and Motivations of Low-Resource Speech Synthesis for Language Revitalization
    Pine, Aidan
    Wells, Dan
    Brinklow, Nathan Thanyehtenhas
    Littell, Patrick
    Richmond, Korin
    PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), VOL 1: (LONG PAPERS), 2022, : 7346 - 7359
  • [24] SPEAKER-AWARE TRAINING OF ATTENTION-BASED END-TO-END SPEECH RECOGNITION USING NEURAL SPEAKER EMBEDDINGS
    Rouhe, Aku
    Kaseva, Tuomas
    Kurimo, Mikko
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 7064 - 7068
  • [25] Multilingual Speech Corpus in Low-Resource Eastern and Northeastern Indian Languages for Speaker and Language Identification
    Basu, Joyanta
    Khan, Soma
    Roy, Rajib
    Basu, Tapan Kumar
    Majumder, Swanirbhar
    CIRCUITS SYSTEMS AND SIGNAL PROCESSING, 2021, 40 (10) : 4986 - 5013
  • [26] Multilingual Speech Corpus in Low-Resource Eastern and Northeastern Indian Languages for Speaker and Language Identification
    Joyanta Basu
    Soma Khan
    Rajib Roy
    Tapan Kumar Basu
    Swanirbhar Majumder
    Circuits, Systems, and Signal Processing, 2021, 40 : 4986 - 5013
  • [27] MakeltTalk: Speaker-Aware Talking-Head Animation
    Zhou, Yang
    Han, Xintong
    Shechtman, Eli
    Echevarria, Jose
    Kalogerakis, Evangelos
    Li, Dingzeyu
    ACM TRANSACTIONS ON GRAPHICS, 2020, 39 (06):
  • [28] Speaker-Aware Long Short-Term Memory Multi-Task Learning for Speech Recognition
    Pironkov, Gueorgui
    Dupont, Stephane
    Dutoit, Thierry
    2016 24TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2016, : 1911 - 1915
  • [29] Low-Resource Emotional Speech Synthesis: Transfer Learning and Data Requirements
    Nesterenko, Anton
    Akhmerov, Ruslan
    Matveeva, Yulia
    Goremykina, Anna
    Astankov, Dmitry
    Shuranov, Evgeniy
    Shirshova, Alexandra
    SPEECH AND COMPUTER, SPECOM 2022, 2022, 13721 : 508 - 521
  • [30] DATA-DRIVEN PHRASING FOR SPEECH SYNTHESIS IN LOW-RESOURCE LANGUAGES
    Parlikar, Alok
    Black, Alan W.
    2012 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2012, : 4013 - 4016