A COMPARISON OF SUPERVISED AND UNSUPERVISED CROSS-LINGUAL SPEAKER ADAPTATION APPROACHES FOR HMM-BASED SPEECH SYNTHESIS

被引：17

作者：

Liang, Hui ^{[1
]}

Dines, John ^{[1
]}

Saheer, Lakshmi ^{[1
]}

机构：

[1] Idiap Res Inst, Martigny, Switzerland

来源：

2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING | 2010年

关键词：

unsupervised cross-lingual speaker adaptation; decision tree marginalization; HMM state mapping; ALGORITHMS;

D O I：

10.1109/ICASSP.2010.5495559

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

The EMIME project aims to build a personalized speech-to-speech translator, such that spoken input of a user in one language is used to produce spoken output that still sounds like the user's voice however in another language. This distinctiveness makes unsupervised cross-lingual speaker adaptation one key to the project's success. So far, research has been conducted into unsupervised and cross-lingual cases separately by means of decision tree marginalization and HMM state mapping respectively. In this paper we combine the two techniques to perform unsupervised cross-lingual speaker adaptation. The performance of eight speaker adaptation systems (supervised vs. unsupervised, intra-lingual vs. cross-lingual) is compared using objective and subjective evaluations. Experimental results show the performance of unsupervised cross-lingual speaker adaptation is comparable to that of the supervised case in terms of spectrum adaptation in the EMIME scenario, even though automatically obtained transcriptions have a very high phoneme error rate.

引用

页码：4598 / 4601

页数：4

共 50 条

[1] UNSUPERVISED CROSS-LINGUAL SPEAKER ADAPTATION FOR HMM-BASED SPEECH SYNTHESIS
Oura, Keiichiro
Tokuda, Keiichi
Yamagishi, Junichi
King, Simon
Wester, Mirjam
[J]. 2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, : 4594 - 4597
[2] CROSS-LINGUAL SPEAKER ADAPTATION FOR HMM-BASED SPEECH SYNTHESIS
Wu, Yi-Jian
King, Simon
Tokuda, Keiichi
[J]. 2008 6TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, 2008, : 9 - 12
[3] Personalising speech-to-speech translation: Unsupervised cross-lingual speaker adaptation for HMM-based speech synthesis
Dines, John
Liang, Hui
Saheer, Lakshmi
Gibson, Matthew
Byrne, William
Oura, Keiichiro
Tokuda, Keiichi
Yamagishi, Junichi
King, Simon
Wester, Mirjam
Hirsimaki, Teemu
Karhila, Reima
Kurimo, Mikko
[J]. COMPUTER SPEECH AND LANGUAGE, 2013, 27 (02): : 420 - 437
[4] Cross-lingual Speaker Adaptation for HMM-based Speech Synthesis based on Perceptual Characteristics and Speaker Interpolation
Oliveira, Viviane de Franca
Shiota, Sayaka
Nankaku, Yoshihiko
Tokuda, Keiichi
[J]. 13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 982 - 985
[5] State mapping based method for cross-lingual speaker adaptation in HMM-based speech synthesis
Wu, Yi-Jian
Nankaku, Yoshihiko
Tokuda, Keiichi
[J]. INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 516 - 519
[6] Analysis of unsupervised cross-lingual speaker adaptation for HMM-based speech synthesis using KLD-based transform mapping
Oura, Keiichiro
Yamagishi, Junichi
Wester, Mirjam
King, Simon
Tokuda, Keiichi
[J]. SPEECH COMMUNICATION, 2012, 54 (06) : 703 - 714
[7] UNSUPERVISED CROSS-LINGUAL SPEAKER ADAPTATION FOR HMM-BASED SPEECH SYNTHESIS USING TWO-PASS DECISION TREE CONSTRUCTION
Gibson, Matthew
Hirsimaki, Teemu
Karhila, Reima
Kurimo, Mikko
Byrne, William
[J]. 2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, : 4642 - 4645
[8] Unsupervised Intralingual and Cross-Lingual Speaker Adaptation for HMM-Based Speech Synthesis Using Two-Pass Decision Tree Construction
Gibson, Matthew
Byrne, William
[J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2011, 19 (04): : 895 - 904
[9] Cross-lingual speaker adaptation for HMM-based speech synthesis considering differences between language-dependent average voices
Peng, Xianglin
Oura, Keiichiro
Nankaku, Yoshihiko
Tokuda, Keiichi
[J]. 2010 IEEE 10TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING PROCEEDINGS (ICSP2010), VOLS I-III, 2010, : 605 - 608
[10] Unsupervised adaptation for HMM-based speech synthesis
King, Simon
Tokuda, Keiichi
Zen, Heiga
Yamagishi, Junichi
[J]. INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 1869 - +

← 1 2 3 4 5 →