Unsupervised Speaker and Expression Factorization for Multi-Speaker Expressive Synthesis of Ebooks

被引：0

作者：

Chen, Langzhou ^{[1
]}

Braunschweiler, Norbert ^{[1
]}

机构：

[1] Toshiba Res Europe Ltd, Cambridge, England

来源：

14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5 | 2013年

关键词：

expressive speech synthesis; hidden Markov model; cluster adaptive training; factorization; audiobook;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This work aims to improve expressive speech synthesis of ebooks for multiple speakers by using training data from many audiobooks. Audiobooks contain a wide variety of expressive speaking styles which are often impractical to annotate. However, the speaker -expression factorization (SEF) framework, which has been proven to be a powerful tool in speaker and expression modelling usually requires the (supervised) information about expressions in the training data. This work presents an unsupervised SEF method which implements the SEF on unlabelled training data in the framework of cluster adaptive training (CAT). The proposed method integrates the expression clustering and parameter estimation in a single process to maximize the likelihood of the training data. Experimental results indicate that it outperforms the cascade system of expression clustering and supervised SEF, and significantly improves the expressiveness of the synthetic speech of different speakers.

引用

页码：1041 / 1045

页数：5

共 50 条

[1] PHONEME DEPENDENT SPEAKER EMBEDDING AND MODEL FACTORIZATION FOR MULTI-SPEAKER SPEECH SYNTHESIS AND ADAPTATION
Fu, Ruibo
Tao, Jianhua
Wen, Zhengqi
Zheng, Yibin
[J]. 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6930 - 6934
[2] An Unsupervised Method to Select a Speaker Subset from Large Multi-Speaker Speech Synthesis Datasets
Gallegos, Pilar Oplustil
Williams, Jennifer
Rownicka, Joanna
King, Simon
[J]. INTERSPEECH 2020, 2020, : 1758 - 1762
[3] TOWARDS MULTI-SPEAKER UNSUPERVISED SPEECH PATTERN DISCOVERY
Zhang, Yaodong
Glass, James R.
[J]. 2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, : 4366 - 4369
[4] Unsupervised Discovery of Phoneme Boundaries in Multi-Speaker Continuous Speech
Armstrong, Tom
Antetomaso, Stephanie
[J]. 2011 IEEE INTERNATIONAL CONFERENCE ON DEVELOPMENT AND LEARNING (ICDL), 2011,
[5] MULTI-SPEAKER MODELING AND SPEAKER ADAPTATION FOR DNN-BASED TTS SYNTHESIS
Fan, Yuchen
Qian, Yao
Soong, Frank K.
He, Lei
[J]. 2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 4475 - 4479
[6] Improving Multi-Speaker Tacotron with Speaker Gating Mechanisms
Zhao, Wei
Xu, Li
He, Ting
[J]. PROCEEDINGS OF THE 39TH CHINESE CONTROL CONFERENCE, 2020, : 7498 - 7503
[7] A hybrid approach to speaker recognition in multi-speaker environment
Trivedi, J
Maitra, A
Mitra, SK
[J]. PATTERN RECOGNITION AND MACHINE INTELLIGENCE, PROCEEDINGS, 2005, 3776 : 272 - 275
[8] Speaker Clustering with Penalty Distance for Speaker Verification with Multi-Speaker Speech
Das, Rohan Kumar
Yang, Jichen
Li, Haizhou
[J]. 2019 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2019, : 1630 - 1635
[9] Automatic speaker clustering from multi-speaker utterances
McLaughlin, J
Reynolds, D
Singer, E
O'Leary, GC
[J]. ICASSP '99: 1999 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, PROCEEDINGS VOLS I-VI, 1999, : 817 - 820
[10] Speaker conditioned acoustic modeling for multi-speaker conversational ASR
Chetupalli, Srikanth Raj
Ganapathy, Sriram
[J]. INTERSPEECH 2022, 2022, : 3834 - 3838

← 1 2 3 4 5 →