Cross-lingual Low Resource Speaker Adaptation Using Phonological Features

被引:5
|
作者
Maniati, Georgia [1 ]
Ellinas, Nikolaos [1 ]
Markopoulos, Konstantinos [1 ]
Vamvoukakis, Georgios [1 ]
Sung, June Sig [2 ]
Park, Hyoungmin [2 ]
Chalamandaris, Aimilios [1 ]
Tsiakoulis, Pirros [1 ]
机构
[1] Samsung Elect, Innoet, Athens, Greece
[2] Samsung Elect, Mobile Commun Business, Seoul, South Korea
来源
INTERSPEECH 2021 | 2021年
关键词
cross-lingual; multilingual; speaker adaptation; speech synthesis; low resource; SPEECH SYNTHESIS; ACOUSTIC MODEL;
D O I
10.21437/Interspeech.2021-327
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
The idea of using phonological features instead of phonemes as input to sequence-to-sequence TTS has been recently proposed for zero-shot multilingual speech synthesis. This approach is useful for code-switching, as it facilitates the seamless uttering of foreign text embedded in a stream of native text. In our work, we train a language-agnostic multispeaker model conditioned on a set of phonologically derived features common across different languages, with the goal of achieving cross-lingual speaker adaptation. We first experiment with the effect of language phonological similarity on cross-lingual TTS of several source-target language combinations. Subsequently, we finetune the model with very limited data of a new speaker's voice in either a seen or an unseen language, and achieve synthetic speech of equal quality, while preserving the target speaker's identity. With as few as 32 and 8 utterances of target speaker data, we obtain high speaker similarity scores and naturalness comparable to the corresponding literature. In the extreme case of only 2 available adaptation utterances, we find that our model behaves as a few-shot learner, as the performance is similar in both the seen and unseen adaptation language scenarios.
引用
收藏
页码:1594 / 1598
页数:5
相关论文
共 50 条
  • [41] FonBund: A Library for Combining Cross-lingual Phonological Segment Data
    Gutkin, Alexander
    Jansche, Martin
    Merkulova, Tatiana
    PROCEEDINGS OF THE ELEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2018), 2018, : 2236 - 2240
  • [42] Cross-lingual, Multi-speaker Text-To-Speech Synthesis Using Neural Speaker Embedding
    Chen, Mengnan
    Chen, Minchuan
    Liang, Shuang
    Ma, Jun
    Chen, Lei
    Wang, Shaojun
    Xiao, Jing
    INTERSPEECH 2019, 2019, : 2105 - 2109
  • [43] Unsupervised Cross-Lingual Adaptation of Dependency Parsers Using CRF Autoencoders
    Li, Zhao
    Tu, Kewei
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2020, 2020, : 2127 - 2133
  • [44] SpeakerNet for Cross-lingual Text-Independent Speaker Verification
    Habib, Hafsa
    Tauseef, Huma
    Fahiem, Muhammad Abuzar
    Farhan, Saima
    Usman, Ghousia
    ARCHIVES OF ACOUSTICS, 2020, 45 (04) : 573 - 583
  • [45] State mapping based method for cross-lingual speaker adaptation in HMM-based speech synthesis
    Wu, Yi-Jian
    Nankaku, Yoshihiko
    Tokuda, Keiichi
    INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 516 - 519
  • [46] A COMPARISON OF SUPERVISED AND UNSUPERVISED CROSS-LINGUAL SPEAKER ADAPTATION APPROACHES FOR HMM-BASED SPEECH SYNTHESIS
    Liang, Hui
    Dines, John
    Saheer, Lakshmi
    2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, : 4598 - 4601
  • [47] UniSplice: Universal Cross-Lingual Data Splicing for Low-Resource ASR
    Wang, Wei
    Qian, Yanmin
    INTERSPEECH 2023, 2023, : 2253 - 2257
  • [48] Improving Low-Resource Cross-lingual Parsing with Expected Statistic Regularization
    Effland, Thomas
    Collins, Michael
    TRANSACTIONS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, 2023, 11 : 122 - 138
  • [49] Improving Candidate Generation for Low-resource Cross-lingual Entity Linking
    Zhou, Shuyan
    Rijhwani, Shruti
    Wieting, John
    Carbonell, Jaime
    Neubig, Graham
    TRANSACTIONS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, 2020, 8 : 109 - 124
  • [50] Cross-lingual transfer learning during supervised training in low resource scenarios
    Das, Amit
    Hasegawa-Johnson, Mark
    16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 3531 - 3535