Cross-lingual Low Resource Speaker Adaptation Using Phonological Features

被引：5

作者：

Maniati, Georgia ^{[1
]}

Ellinas, Nikolaos ^{[1
]}

Markopoulos, Konstantinos ^{[1
]}

Vamvoukakis, Georgios ^{[1
]}

Sung, June Sig ^{[2
]}

Park, Hyoungmin ^{[2
]}

Chalamandaris, Aimilios ^{[1
]}

Tsiakoulis, Pirros ^{[1
]}

机构：

[1] Samsung Elect, Innoet, Athens, Greece

[2] Samsung Elect, Mobile Commun Business, Seoul, South Korea

来源：

INTERSPEECH 2021 | 2021年

关键词：

cross-lingual; multilingual; speaker adaptation; speech synthesis; low resource; SPEECH SYNTHESIS; ACOUSTIC MODEL;

D O I：

10.21437/Interspeech.2021-327

中图分类号：

R36 [病理学]; R76 [耳鼻咽喉科学];

学科分类号：

100104 ; 100213 ;

摘要：

The idea of using phonological features instead of phonemes as input to sequence-to-sequence TTS has been recently proposed for zero-shot multilingual speech synthesis. This approach is useful for code-switching, as it facilitates the seamless uttering of foreign text embedded in a stream of native text. In our work, we train a language-agnostic multispeaker model conditioned on a set of phonologically derived features common across different languages, with the goal of achieving cross-lingual speaker adaptation. We first experiment with the effect of language phonological similarity on cross-lingual TTS of several source-target language combinations. Subsequently, we finetune the model with very limited data of a new speaker's voice in either a seen or an unseen language, and achieve synthetic speech of equal quality, while preserving the target speaker's identity. With as few as 32 and 8 utterances of target speaker data, we obtain high speaker similarity scores and naturalness comparable to the corresponding literature. In the extreme case of only 2 available adaptation utterances, we find that our model behaves as a few-shot learner, as the performance is similar in both the seen and unseen adaptation language scenarios.

引用

页码：1594 / 1598

页数：5

共 50 条

[1] Phonological Knowledge Guided HMM State Mapping for Cross-Lingual Speaker Adaptation
Liang, Hui
Dines, John
12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 1836 - +
[2] ASR data augmentation in low-resource settings using cross-lingual multi-speaker TTS and cross-lingual voice conversion
Casanova, Edresson
Shulby, Christopher
Korolev, Alexander
Candido Junior, Arnaldo
Soares, Anderson da Silva
Aluisio, Sandra
Ponti, Moacir Antonelli
INTERSPEECH 2023, 2023, : 1244 - 1248
[3] Cross-Lingual Consistency of Phonological Features: An Empirical Study
Johny, Cibu
Gutkin, Alexander
Jansche, Martin
INTERSPEECH 2019, 2019, : 1741 - 1745
[4] Cross-lingual Speaker Adaptation using Domain Adaptation and Speaker Consistency Loss for Text-To-Speech Synthesis
Xin, Detai
Saito, Yuki
Takamichi, Shinnosuke
Koriyama, Tomoki
Saruwatari, Hiroshi
INTERSPEECH 2021, 2021, : 1614 - 1618
[5] Cross-lingual speaker adaptation using domain adaptation and speaker consistency loss for text-to-speech synthesis
Xin, Detai
Saito, Yuki
Takamichi, Shinnosuke
Koriyama, Tomoki
Saruwatari, Hiroshi
Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, 2021, 5 : 3376 - 3380
[6] Cross-Lingual Speaker Adaptation for Statistical Speech Synthesis Using Limited Data
Saffjoo, Seyyed Saeed
Demiroglu, Cenk
17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 317 - 321
[7] Cross-lingual Speaker Adaptation via Gaussian Component Mapping
Cao, Houwei
Lee, Tan
Ching, P. C.
11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 1-2, 2010, : 869 - 872
[8] CROSS-LINGUAL SPEAKER ADAPTATION FOR HMM-BASED SPEECH SYNTHESIS
Wu, Yi-Jian
King, Simon
Tokuda, Keiichi
2008 6TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, 2008, : 9 - 12
[9] Cross-lingual Adaptation Using Universal Dependencies
Taghizadeh, Nasrin
Faili, Heshaam
ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2021, 20 (04)
[10] CAM: A cross-lingual adaptation framework for low-resource language speech recognition
Hu, Qing
Zhang, Yan
Zhang, Xianlei
Han, Zongyu
Yu, Xilong
INFORMATION FUSION, 2024, 111

← 1 2 3 4 5 →