Cross-lingual Low Resource Speaker Adaptation Using Phonological Features

被引:5
|
作者
Maniati, Georgia [1 ]
Ellinas, Nikolaos [1 ]
Markopoulos, Konstantinos [1 ]
Vamvoukakis, Georgios [1 ]
Sung, June Sig [2 ]
Park, Hyoungmin [2 ]
Chalamandaris, Aimilios [1 ]
Tsiakoulis, Pirros [1 ]
机构
[1] Samsung Elect, Innoet, Athens, Greece
[2] Samsung Elect, Mobile Commun Business, Seoul, South Korea
来源
关键词
cross-lingual; multilingual; speaker adaptation; speech synthesis; low resource; SPEECH SYNTHESIS; ACOUSTIC MODEL;
D O I
10.21437/Interspeech.2021-327
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
The idea of using phonological features instead of phonemes as input to sequence-to-sequence TTS has been recently proposed for zero-shot multilingual speech synthesis. This approach is useful for code-switching, as it facilitates the seamless uttering of foreign text embedded in a stream of native text. In our work, we train a language-agnostic multispeaker model conditioned on a set of phonologically derived features common across different languages, with the goal of achieving cross-lingual speaker adaptation. We first experiment with the effect of language phonological similarity on cross-lingual TTS of several source-target language combinations. Subsequently, we finetune the model with very limited data of a new speaker's voice in either a seen or an unseen language, and achieve synthetic speech of equal quality, while preserving the target speaker's identity. With as few as 32 and 8 utterances of target speaker data, we obtain high speaker similarity scores and naturalness comparable to the corresponding literature. In the extreme case of only 2 available adaptation utterances, we find that our model behaves as a few-shot learner, as the performance is similar in both the seen and unseen adaptation language scenarios.
引用
收藏
页码:1594 / 1598
页数:5
相关论文
共 50 条
  • [1] Phonological Knowledge Guided HMM State Mapping for Cross-Lingual Speaker Adaptation
    Liang, Hui
    Dines, John
    12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 1836 - +
  • [2] ASR data augmentation in low-resource settings using cross-lingual multi-speaker TTS and cross-lingual voice conversion
    Casanova, Edresson
    Shulby, Christopher
    Korolev, Alexander
    Candido Junior, Arnaldo
    Soares, Anderson da Silva
    Aluisio, Sandra
    Ponti, Moacir Antonelli
    INTERSPEECH 2023, 2023, : 1244 - 1248
  • [3] Cross-Lingual Consistency of Phonological Features: An Empirical Study
    Johny, Cibu
    Gutkin, Alexander
    Jansche, Martin
    INTERSPEECH 2019, 2019, : 1741 - 1745
  • [4] Cross-lingual Speaker Adaptation using Domain Adaptation and Speaker Consistency Loss for Text-To-Speech Synthesis
    Xin, Detai
    Saito, Yuki
    Takamichi, Shinnosuke
    Koriyama, Tomoki
    Saruwatari, Hiroshi
    INTERSPEECH 2021, 2021, : 1614 - 1618
  • [5] Cross-lingual speaker adaptation using domain adaptation and speaker consistency loss for text-to-speech synthesis
    Xin, Detai
    Saito, Yuki
    Takamichi, Shinnosuke
    Koriyama, Tomoki
    Saruwatari, Hiroshi
    Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, 2021, 5 : 3376 - 3380
  • [6] Cross-Lingual Speaker Adaptation for Statistical Speech Synthesis Using Limited Data
    Saffjoo, Seyyed Saeed
    Demiroglu, Cenk
    17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 317 - 321
  • [7] Cross-lingual Speaker Adaptation via Gaussian Component Mapping
    Cao, Houwei
    Lee, Tan
    Ching, P. C.
    11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 1-2, 2010, : 869 - 872
  • [8] CROSS-LINGUAL SPEAKER ADAPTATION FOR HMM-BASED SPEECH SYNTHESIS
    Wu, Yi-Jian
    King, Simon
    Tokuda, Keiichi
    2008 6TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, 2008, : 9 - 12
  • [9] Cross-lingual Adaptation Using Universal Dependencies
    Taghizadeh, Nasrin
    Faili, Heshaam
    ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2021, 20 (04)
  • [10] CAM: A cross-lingual adaptation framework for low-resource language speech recognition
    Hu, Qing
    Zhang, Yan
    Zhang, Xianlei
    Han, Zongyu
    Yu, Xilong
    INFORMATION FUSION, 2024, 111