Cross-lingual Low Resource Speaker Adaptation Using Phonological Features

被引：5

作者：

Maniati, Georgia ^{[1
]}

Ellinas, Nikolaos ^{[1
]}

Markopoulos, Konstantinos ^{[1
]}

Vamvoukakis, Georgios ^{[1
]}

Sung, June Sig ^{[2
]}

Park, Hyoungmin ^{[2
]}

Chalamandaris, Aimilios ^{[1
]}

Tsiakoulis, Pirros ^{[1
]}

机构：

[1] Samsung Elect, Innoet, Athens, Greece

[2] Samsung Elect, Mobile Commun Business, Seoul, South Korea

来源：

INTERSPEECH 2021 | 2021年

关键词：

cross-lingual; multilingual; speaker adaptation; speech synthesis; low resource; SPEECH SYNTHESIS; ACOUSTIC MODEL;

D O I：

10.21437/Interspeech.2021-327

中图分类号：

R36 [病理学]; R76 [耳鼻咽喉科学];

学科分类号：

100104 ; 100213 ;

摘要：

The idea of using phonological features instead of phonemes as input to sequence-to-sequence TTS has been recently proposed for zero-shot multilingual speech synthesis. This approach is useful for code-switching, as it facilitates the seamless uttering of foreign text embedded in a stream of native text. In our work, we train a language-agnostic multispeaker model conditioned on a set of phonologically derived features common across different languages, with the goal of achieving cross-lingual speaker adaptation. We first experiment with the effect of language phonological similarity on cross-lingual TTS of several source-target language combinations. Subsequently, we finetune the model with very limited data of a new speaker's voice in either a seen or an unseen language, and achieve synthetic speech of equal quality, while preserving the target speaker's identity. With as few as 32 and 8 utterances of target speaker data, we obtain high speaker similarity scores and naturalness comparable to the corresponding literature. In the extreme case of only 2 available adaptation utterances, we find that our model behaves as a few-shot learner, as the performance is similar in both the seen and unseen adaptation language scenarios.

引用

页码：1594 / 1598

页数：5

共 50 条

[31] CROSS-LINGUAL TRANSFER LEARNING FOR LOW-RESOURCE SPEECH TRANSLATION
Khurana, Sameer
Dawalatabad, Nauman
Laurent, Antoine
Vicente, Luis
Gimeno, Pablo
Mingote, Victoria
Glass, James
2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING WORKSHOPS, ICASSPW 2024, 2024, : 670 - 674
[32] Cross-Lingual Language Modeling for Low-Resource Speech Recognition
Xu, Ping
Fung, Pascale
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2013, 21 (06): : 1134 - 1144
[33] Design Challenges in Low-resource Cross-lingual Entity Linking
Fu, Xingyu
Shi, Weijia
Yu, Xiaodong
Zhao, Zian
Roth, Dan
PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), 2020, : 6418 - 6432
[34] Exploiting Adapters for Cross-Lingual Low-Resource Speech Recognition
Hou, Wenxin
Zhu, Han
Wang, Yidong
Wang, Jindong
Qin, Tao
Xu, Renju
Shinozaki, Takahiro
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2022, 30 : 317 - 329
[35] Cross-Lingual Retrieval Augmented Prompt for Low-Resource Languages
Nie, Ercong
Liang, Sheng
Schmid, Helmut
Schuetze, Hinrich
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023), 2023, : 8320 - 8340
[36] Low-Resource Cross-Lingual Adaptive Training for Nigerian Pidgin
Lin, Pin-Jie
Saeed, Muhammed
Chang, Ernie
Scholman, Merel
INTERSPEECH 2023, 2023, : 3954 - 3958
[37] Cross-Lingual Word Embeddings for Low-Resource Language Modeling
Adams, Oliver
Makarucha, Adam
Neubig, Graham
Bird, Steven
Cohn, Trevor
15TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EACL 2017), VOL 1: LONG PAPERS, 2017, : 937 - 947
[38] Cross-Lingual Automatic Speech Recognition Using Tandem Features
Lal, Partha
King, Simon
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2013, 21 (12): : 2506 - 2515
[39] On the Robustness of Cross-lingual Speaker Recognition using Transformer-based Approaches
Liao, Wen-Hung
Chen, Wei-Yu
Wu, Yi-Chieh
2022 26TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2022, : 366 - 371
[40] Cross-lingual Adaptation for Recipe Retrieval with Mixup
Zhu, Bin
Ngo, Chong-Wah
Chen, Jingjing
Chan, Wing-Kwong
PROCEEDINGS OF THE 2022 INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, ICMR 2022, 2022, : 258 - 267

← 1 2 3 4 5 →