Can Speaker Augmentation Improve Multi-Speaker End-to-End TTS?

被引：8

作者：

Cooper, Erica ^{[1
]}

Lai, Cheng-, I ^{[2
]}

Yasuda, Yusuke ^{[1
]}

Yamagishi, Junichi ^{[1
]}

机构：

[1] Natl Inst Informat, Tokyo, Japan

[2] MIT, 77 Massachusetts Ave, Cambridge, MA 02139 USA

来源：

INTERSPEECH 2020 | 2020年

关键词：

Speaker augmentation; Speech synthesis; dialect identification; channel modeling; transfer learning;

D O I：

10.21437/Interspeech.2020-1229

中图分类号：

R36 [病理学]; R76 [耳鼻咽喉科学];

学科分类号：

100104 ; 100213 ;

摘要：

Previous work on speaker adaptation for end-to-end speech synthesis still falls short in speaker similarity. We investigate an orthogonal approach to the current speaker adaptation paradigms, speaker augmentation, by creating artificial speakers and by taking advantage of low-quality data. The base Tacotron2 model is modified to account for the channel and dialect factors inherent in these corpora. In addition, we describe a warm-start training strategy that we adopted for Tacotron2 training. A large-scale listening test is conducted, and a distance metric is adopted to evaluate synthesis of dialects. This is followed by an analysis on synthesis quality, speaker and dialect similarity, and a remark on the effectiveness of our speaker augmentation approach. Audio samples are available online(1).

引用

页码：3979 / 3983

页数：5

共 50 条

[1] END-TO-END MULTI-SPEAKER SPEECH RECOGNITION
Settle, Shane
Le Roux, Jonathan
Hori, Takaaki
Watanabe, Shinji
Hershey, John R.
[J]. 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 4819 - 4823
[2] End-to-End Multilingual Multi-Speaker Speech Recognition
Seki, Hiroshi
Hori, Takaaki
Watanabe, Shinji
Le Roux, Jonathan
Hershey, John R.
[J]. INTERSPEECH 2019, 2019, : 3755 - 3759
[3] END-TO-END MULTI-SPEAKER SPEECH RECOGNITION WITH TRANSFORMER
Chang, Xuankai
Zhang, Wangyou
Qian, Yanmin
Le Roux, Jonathan
Watanabe, Shinji
[J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 6134 - 6138
[4] End-to-End Multi-Speaker Speech Recognition using Speaker Embeddings and Transfer Learning
Denisov, Pavel
Ngoc Thang Vu
[J]. INTERSPEECH 2019, 2019, : 4425 - 4429
[5] END-TO-END MULTI-SPEAKER ASR WITH INDEPENDENT VECTOR ANALYSIS
Scheibler, Robin
Zhang, Wangyou
Chang, Xuankai
Watanabe, Shinji
Qian, Yanmin
[J]. 2022 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, SLT, 2022, : 496 - 501
[6] A Purely End-to-end System for Multi-speaker Speech Recognition
Seki, Hiroshi
Hori, Takaaki
Watanabe, Shinji
Le Roux, Jonathan
Hershey, John R.
[J]. PROCEEDINGS OF THE 56TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL), VOL 1, 2018, : 2620 - 2630
[7] EXTENDED GRAPH TEMPORAL CLASSIFICATION FOR MULTI-SPEAKER END-TO-END ASR
Chang, Xuankai
Moritz, Niko
Hori, Takaaki
Watanabe, Shinji
Le Roux, Jonathan
[J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 7322 - 7326
[8] END-TO-END MONAURAL MULTI-SPEAKER ASR SYSTEM WITHOUT PRETRAINING
Chang, Xuankai
Qian, Yanmin
Yu, Kai
Watanabe, Shinji
[J]. 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6256 - 6260
[9] Real-time End-to-End Monaural Multi-speaker Speech Recognition
Li, Song
Ouyang, Beibei
Tong, Fuchuan
Liao, Dexin
Li, Lin
Hong, Qingyang
[J]. INTERSPEECH 2021, 2021, : 3750 - 3754
[10] MIMO-SPEECH: END-TO-END MULTI-CHANNEL MULTI-SPEAKER SPEECH RECOGNITION
Chang, Xuankai
Zhang, Wangyou
Qian, Yanmin
Le Roux, Jonathan
Watanabe, Shinji
[J]. 2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019), 2019, : 237 - 244

← 1 2 3 4 5 →