Cross-lingual Low Resource Speaker Adaptation Using Phonological Features

被引:5
|
作者
Maniati, Georgia [1 ]
Ellinas, Nikolaos [1 ]
Markopoulos, Konstantinos [1 ]
Vamvoukakis, Georgios [1 ]
Sung, June Sig [2 ]
Park, Hyoungmin [2 ]
Chalamandaris, Aimilios [1 ]
Tsiakoulis, Pirros [1 ]
机构
[1] Samsung Elect, Innoet, Athens, Greece
[2] Samsung Elect, Mobile Commun Business, Seoul, South Korea
来源
INTERSPEECH 2021 | 2021年
关键词
cross-lingual; multilingual; speaker adaptation; speech synthesis; low resource; SPEECH SYNTHESIS; ACOUSTIC MODEL;
D O I
10.21437/Interspeech.2021-327
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
The idea of using phonological features instead of phonemes as input to sequence-to-sequence TTS has been recently proposed for zero-shot multilingual speech synthesis. This approach is useful for code-switching, as it facilitates the seamless uttering of foreign text embedded in a stream of native text. In our work, we train a language-agnostic multispeaker model conditioned on a set of phonologically derived features common across different languages, with the goal of achieving cross-lingual speaker adaptation. We first experiment with the effect of language phonological similarity on cross-lingual TTS of several source-target language combinations. Subsequently, we finetune the model with very limited data of a new speaker's voice in either a seen or an unseen language, and achieve synthetic speech of equal quality, while preserving the target speaker's identity. With as few as 32 and 8 utterances of target speaker data, we obtain high speaker similarity scores and naturalness comparable to the corresponding literature. In the extreme case of only 2 available adaptation utterances, we find that our model behaves as a few-shot learner, as the performance is similar in both the seen and unseen adaptation language scenarios.
引用
收藏
页码:1594 / 1598
页数:5
相关论文
共 50 条
  • [21] Universal Cross-Lingual Data Generation for Low Resource ASR
    Wang, Wei
    Qian, Yanmin
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 973 - 983
  • [22] Cross-lingual intent classification in a low resource industrial setting
    Khalil, Talaat
    Kielczewski, Kornel
    Chouliaras, Georgios Christos
    Keldibek, Amina
    Versteegh, Maarten
    2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019): PROCEEDINGS OF THE CONFERENCE, 2019, : 6419 - 6424
  • [23] Cross-Lingual Morphological Tagging for Low-Resource Languages
    Buys, Jan
    Botha, Jan A.
    PROCEEDINGS OF THE 54TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1, 2016, : 1954 - 1964
  • [24] Improving Low Resource Named Entity Recognition using Cross-lingual Knowledge Transfer
    Feng, Xiaocheng
    Feng, Xiachong
    Qin, Bing
    Feng, Zhangyin
    Liu, Ting
    PROCEEDINGS OF THE TWENTY-SEVENTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2018, : 4071 - 4077
  • [25] Cross-lingual Speaker Adaptation for HMM-based Speech Synthesis based on Perceptual Characteristics and Speaker Interpolation
    Oliveira, Viviane de Franca
    Shiota, Sayaka
    Nankaku, Yoshihiko
    Tokuda, Keiichi
    13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 982 - 985
  • [26] Automatic Wordnet Development for Low-Resource Languages using Cross-Lingual WSD
    Taghizadeh, Nasrin
    Faili, Hesham
    JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2016, 56 : 61 - 87
  • [27] Using Eigenvoices and Nearest-Neighbors in HMM-Based Cross-Lingual Speaker Adaptation With Limited Data
    Sarfjoo, Seyyed Saeed
    Demiroglu, Cenk
    King, Simon
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2017, 25 (04) : 839 - 851
  • [28] Cross-lingual Speaker Verification with Deep Feature Learning
    Li, Lantian
    Wang, Dong
    Rozi, Askar
    Zheng, Thomas Fang
    2017 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC 2017), 2017, : 1040 - 1044
  • [29] CROSS-LINGUAL SPEAKER VERIFICATION BASED ON LINEAR TRANSFORM
    Askar, Rozi
    Wang, Dong
    Bie, Fanhu
    Wang, Jun
    Zheng, Thomas Fang
    2015 IEEE CHINA SUMMIT & INTERNATIONAL CONFERENCE ON SIGNAL AND INFORMATION PROCESSING, 2015, : 519 - 523
  • [30] An Analysis of Language Mismatch in HMM State Mapping-Based Cross-Lingual Speaker Adaptation
    Liang, Hui
    Dines, John
    11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 1-2, 2010, : 622 - 625