Cross-lingual Low Resource Speaker Adaptation Using Phonological Features

被引：5

作者：

Maniati, Georgia ^{[1
]}

Ellinas, Nikolaos ^{[1
]}

Markopoulos, Konstantinos ^{[1
]}

Vamvoukakis, Georgios ^{[1
]}

Sung, June Sig ^{[2
]}

Park, Hyoungmin ^{[2
]}

Chalamandaris, Aimilios ^{[1
]}

Tsiakoulis, Pirros ^{[1
]}

机构：

[1] Samsung Elect, Innoet, Athens, Greece

[2] Samsung Elect, Mobile Commun Business, Seoul, South Korea

来源：

INTERSPEECH 2021 | 2021年

关键词：

cross-lingual; multilingual; speaker adaptation; speech synthesis; low resource; SPEECH SYNTHESIS; ACOUSTIC MODEL;

D O I：

10.21437/Interspeech.2021-327

中图分类号：

R36 [病理学]; R76 [耳鼻咽喉科学];

学科分类号：

100104 ; 100213 ;

摘要：

The idea of using phonological features instead of phonemes as input to sequence-to-sequence TTS has been recently proposed for zero-shot multilingual speech synthesis. This approach is useful for code-switching, as it facilitates the seamless uttering of foreign text embedded in a stream of native text. In our work, we train a language-agnostic multispeaker model conditioned on a set of phonologically derived features common across different languages, with the goal of achieving cross-lingual speaker adaptation. We first experiment with the effect of language phonological similarity on cross-lingual TTS of several source-target language combinations. Subsequently, we finetune the model with very limited data of a new speaker's voice in either a seen or an unseen language, and achieve synthetic speech of equal quality, while preserving the target speaker's identity. With as few as 32 and 8 utterances of target speaker data, we obtain high speaker similarity scores and naturalness comparable to the corresponding literature. In the extreme case of only 2 available adaptation utterances, we find that our model behaves as a few-shot learner, as the performance is similar in both the seen and unseen adaptation language scenarios.

引用

页码：1594 / 1598

页数：5

共 50 条

[41] FonBund: A Library for Combining Cross-lingual Phonological Segment Data
Gutkin, Alexander
Jansche, Martin
Merkulova, Tatiana
PROCEEDINGS OF THE ELEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2018), 2018, : 2236 - 2240
[42] Cross-lingual, Multi-speaker Text-To-Speech Synthesis Using Neural Speaker Embedding
Chen, Mengnan
Chen, Minchuan
Liang, Shuang
Ma, Jun
Chen, Lei
Wang, Shaojun
Xiao, Jing
INTERSPEECH 2019, 2019, : 2105 - 2109
[43] Unsupervised Cross-Lingual Adaptation of Dependency Parsers Using CRF Autoencoders
Li, Zhao
Tu, Kewei
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2020, 2020, : 2127 - 2133
[44] SpeakerNet for Cross-lingual Text-Independent Speaker Verification
Habib, Hafsa
Tauseef, Huma
Fahiem, Muhammad Abuzar
Farhan, Saima
Usman, Ghousia
ARCHIVES OF ACOUSTICS, 2020, 45 (04) : 573 - 583
[45] State mapping based method for cross-lingual speaker adaptation in HMM-based speech synthesis
Wu, Yi-Jian
Nankaku, Yoshihiko
Tokuda, Keiichi
INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 516 - 519
[46] A COMPARISON OF SUPERVISED AND UNSUPERVISED CROSS-LINGUAL SPEAKER ADAPTATION APPROACHES FOR HMM-BASED SPEECH SYNTHESIS
Liang, Hui
Dines, John
Saheer, Lakshmi
2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, : 4598 - 4601
[47] UniSplice: Universal Cross-Lingual Data Splicing for Low-Resource ASR
Wang, Wei
Qian, Yanmin
INTERSPEECH 2023, 2023, : 2253 - 2257
[48] Improving Low-Resource Cross-lingual Parsing with Expected Statistic Regularization
Effland, Thomas
Collins, Michael
TRANSACTIONS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, 2023, 11 : 122 - 138
[49] Improving Candidate Generation for Low-resource Cross-lingual Entity Linking
Zhou, Shuyan
Rijhwani, Shruti
Wieting, John
Carbonell, Jaime
Neubig, Graham
TRANSACTIONS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, 2020, 8 : 109 - 124
[50] Cross-lingual transfer learning during supervised training in low resource scenarios
Das, Amit
Hasegawa-Johnson, Mark
16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 3531 - 3535

← 1 2 3 4 5 →