Low-Resource Speech Synthesis with Speaker-Aware Embedding

被引：2

作者：

Yang, Li-Jen ^{[1
]}

Yeh, I-Ping ^{[2
]}

Chien, Jen-Tzung ^{[1
]}

机构：

[1] Natl Yang Ming Chiao Tung Univ, Inst Elect & Comp Engn, Hsinchu, Taiwan

[2] Natl Yang Ming Chiao Tung Univ, Grad Degree Program Cybersecur, Hsinchu, Taiwan

来源：

2022 13TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP) | 2022年

关键词：

low-resource speech synthesis; speaker-aware embedding; encoder-decoder model; transformer; NETWORKS;

D O I：

10.1109/ISCSLP57327.2022.10038221

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Speech synthesis has been successfully exploited for mapping from text sequence to speech waveform where high-resource languages have been well studied and learned from a large amount of text-speech paired data in public-domain corpora. However, developing speech synthesis under low-resource languages is challenging for speech communication in local regions since the collection of training data is expensive. In particular, the speaker-aware speech generation under low-resource settings is crucial in real world. Such a problem is increasingly difficult in case of very limited speaker-specific data. This paper presents a speaker-aware speech synthesis under low-resource settings based on an encoder-decoder framework by using transformer. Knowledge transfer is performed by incorporating a speaker-aware embedding through first learning a pretrained transformer from multi-speaker data of a low-populated spoken language and then fine-tuning the transformer to a target speaker with very limited speaker-specific embeddings. Experiments on low-resource Taiwanese speech synthesis are evaluated to show the merit of speaker-aware transformer in terms of Mel cepstral distortion and mean opinion score.

引用

页码：235 / 239

页数：5

共 50 条

[21] USING SPEECH ENHANCEMENT TO REALIZE SPEECH SYNTHESIS OF LOW-RESOURCE DUNGAN LANGUAGES
Jiang, Rui
Chen, Chengsi
Shan, Xin
Yang, Hongwu
2021 24TH CONFERENCE OF THE ORIENTAL COCOSDA INTERNATIONAL COMMITTEE FOR THE CO-ORDINATION AND STANDARDISATION OF SPEECH DATABASES AND ASSESSMENT TECHNIQUES (O-COCOSDA), 2021, : 193 - 198
[22] Review of Speech Synthesis Methods Under Low-Resource Condition
Jialin, Zhang
Wushouer, Mairidan
Tuerhong, Gulanbaier
Computer Engineering and Applications, 2023, 59 (15): : 1 - 16
[23] Requirements and Motivations of Low-Resource Speech Synthesis for Language Revitalization
Pine, Aidan
Wells, Dan
Brinklow, Nathan Thanyehtenhas
Littell, Patrick
Richmond, Korin
PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), VOL 1: (LONG PAPERS), 2022, : 7346 - 7359
[24] SPEAKER-AWARE TRAINING OF ATTENTION-BASED END-TO-END SPEECH RECOGNITION USING NEURAL SPEAKER EMBEDDINGS
Rouhe, Aku
Kaseva, Tuomas
Kurimo, Mikko
2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 7064 - 7068
[25] Multilingual Speech Corpus in Low-Resource Eastern and Northeastern Indian Languages for Speaker and Language Identification
Basu, Joyanta
Khan, Soma
Roy, Rajib
Basu, Tapan Kumar
Majumder, Swanirbhar
CIRCUITS SYSTEMS AND SIGNAL PROCESSING, 2021, 40 (10) : 4986 - 5013
[26] Multilingual Speech Corpus in Low-Resource Eastern and Northeastern Indian Languages for Speaker and Language Identification
Joyanta Basu
Soma Khan
Rajib Roy
Tapan Kumar Basu
Swanirbhar Majumder
Circuits, Systems, and Signal Processing, 2021, 40 : 4986 - 5013
[27] MakeltTalk: Speaker-Aware Talking-Head Animation
Zhou, Yang
Han, Xintong
Shechtman, Eli
Echevarria, Jose
Kalogerakis, Evangelos
Li, Dingzeyu
ACM TRANSACTIONS ON GRAPHICS, 2020, 39 (06):
[28] Speaker-Aware Long Short-Term Memory Multi-Task Learning for Speech Recognition
Pironkov, Gueorgui
Dupont, Stephane
Dutoit, Thierry
2016 24TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2016, : 1911 - 1915
[29] Low-Resource Emotional Speech Synthesis: Transfer Learning and Data Requirements
Nesterenko, Anton
Akhmerov, Ruslan
Matveeva, Yulia
Goremykina, Anna
Astankov, Dmitry
Shuranov, Evgeniy
Shirshova, Alexandra
SPEECH AND COMPUTER, SPECOM 2022, 2022, 13721 : 508 - 521
[30] DATA-DRIVEN PHRASING FOR SPEECH SYNTHESIS IN LOW-RESOURCE LANGUAGES
Parlikar, Alok
Black, Alan W.
2012 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2012, : 4013 - 4016

← 1 2 3 4 5 →