Kernel eigenvoice speaker adaptation

被引：33

作者：

Mak, B ^{[1
]}

Kwok, JT ^{[1
]}

Ho, S ^{[1
]}

机构：

[1] Hong Kong Univ Sci & Technol, Dept Comp Sci, Hong Kong, Hong Kong, Peoples R China

来源：

IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING | 2005年 / 13卷 / 05期

关键词：

composite kernels; eigenvoice speaker adaptation; generalized EM algorithm; kernel eigenvoice speaker adaptation; kernel principal component analysis;

D O I：

10.1109/TSA.2005.851971

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Eigenvoice-based methods have been shown to be effective for fast speaker adaptation when only a small amount of adaptation data, say, less than 10 s, is available. At the heart of the method is principal component analysis (PCA) employed to find the most important eigenvoices. In this paper, we postulate that nonlinear PCA using kernel methods may be even more effective. The eigenvoices thus derived will be called kernel eigenvoices (KEV), and we will call our new adaptation method kernel eigenvoice speaker adaptation. However, unlike the standard eigenvoice (EV) method, an adapted speaker model found by the kernel eigenvoice method resides in the high-dimensional kernel-induced feature space, which, in general, cannot be mapped back to an exact preimage in the input speaker supervector space. Consequently, it is not clear how to obtain the constituent Gaussians of the adapted model that are needed for the computation of state observation likelihoods during the estimation of eigenvoice weights and subsequent decoding. Our solution is the use of composite kernels in such a way that state observation likelihoods can be computed using only kernel functions without the need of a speaker-adapted model in the input supervector space. In this paper, we investigate two different composite kernels for KEV adaptation: direct sum kernel and tensor product kernel. In an evaluation on the TIDIGITS task, it is found that KEV speaker adaptation using both forms of coma posite Gaussian kernels are equally effective, and they outperform a speaker-independent model and adapted models found by EV, MAP, or MLLR adaptation using 2.1 and 4.1 s of speech. For example, with 2.1 s of adaptation data, KEV adaptation outperforms the speaker-independent model by 27.5%, whereas EV, MAP, or MLLR adaptation are not effective at all.

引用

页码：984 / 992

页数：9

共 50 条

[1] Using kernel PCA to improve eigenvoice speaker adaptation
Mak, B
Kwok, JT
Ho, S
PROCEEDINGS OF THE 2004 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-7, 2004, : 3062 - 3067
[2] Eigenvoice speaker adaptation via composite kernel PCA
Kwok, JT
Mak, B
Ho, S
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 16, 2004, 16 : 1401 - 1408
[3] Embedded kernel eigenvoice speaker adaptation and its implication to reference speaker weighting
Mak, Brian Kan-Wing
Hsiao, Roger Wend-Huu
Ho, Simon Ka-Lung
Kwok, James T.
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2006, 14 (04): : 1267 - 1280
[4] Study of various composite kernels for kernel eigenvoice speaker adaptation
Mak, B
Kwok, JT
Ho, S
2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS: SPEECH PROCESSING, 2004, : 325 - 328
[5] Speaker adaptation by hierarchical EigenVoice
Onishi, Yoshifumi
Iso, Ken-Ichi
ICASSP IEEE Int Conf Acoust Speech Signal Process Proc, (576-579):
[6] Speaker adaptation by hierarchical eigenvoice
Onishi, Y
Iso, K
2003 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS: SPEECH PROCESSING I, 2003, : 576 - 579
[7] A new eigenvoice approach to speaker adaptation
Huang, CH
Chien, JT
Wang, HM
2004 INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, 2004, : 109 - 112
[8] Rapid speaker adaptation in eigenvoice space
Kuhn, R
Junqua, JC
Nguyen, P
Niedzielski, N
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2000, 8 (06): : 695 - 707
[9] Various reference speakers determination methods for embedded kernel Eigenvoice speaker adaptation
Mak, B
Ho, S
2005 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1-5: SPEECH PROCESSING, 2005, : 981 - 984
[10] Feature space eigenvoice speaker adaptation
Institute of Information Systems Engineering, Information Engineering University, Zhengzhou
450000, China
Zidonghua Xuebao Acta Auto. Sin., 7 (1244-1252):

← 1 2 3 4 5 →