Kernel eigenvoice speaker adaptation

被引:33
|
作者
Mak, B [1 ]
Kwok, JT [1 ]
Ho, S [1 ]
机构
[1] Hong Kong Univ Sci & Technol, Dept Comp Sci, Hong Kong, Hong Kong, Peoples R China
来源
关键词
composite kernels; eigenvoice speaker adaptation; generalized EM algorithm; kernel eigenvoice speaker adaptation; kernel principal component analysis;
D O I
10.1109/TSA.2005.851971
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Eigenvoice-based methods have been shown to be effective for fast speaker adaptation when only a small amount of adaptation data, say, less than 10 s, is available. At the heart of the method is principal component analysis (PCA) employed to find the most important eigenvoices. In this paper, we postulate that nonlinear PCA using kernel methods may be even more effective. The eigenvoices thus derived will be called kernel eigenvoices (KEV), and we will call our new adaptation method kernel eigenvoice speaker adaptation. However, unlike the standard eigenvoice (EV) method, an adapted speaker model found by the kernel eigenvoice method resides in the high-dimensional kernel-induced feature space, which, in general, cannot be mapped back to an exact preimage in the input speaker supervector space. Consequently, it is not clear how to obtain the constituent Gaussians of the adapted model that are needed for the computation of state observation likelihoods during the estimation of eigenvoice weights and subsequent decoding. Our solution is the use of composite kernels in such a way that state observation likelihoods can be computed using only kernel functions without the need of a speaker-adapted model in the input supervector space. In this paper, we investigate two different composite kernels for KEV adaptation: direct sum kernel and tensor product kernel. In an evaluation on the TIDIGITS task, it is found that KEV speaker adaptation using both forms of coma posite Gaussian kernels are equally effective, and they outperform a speaker-independent model and adapted models found by EV, MAP, or MLLR adaptation using 2.1 and 4.1 s of speech. For example, with 2.1 s of adaptation data, KEV adaptation outperforms the speaker-independent model by 27.5%, whereas EV, MAP, or MLLR adaptation are not effective at all.
引用
收藏
页码:984 / 992
页数:9
相关论文
共 50 条
  • [21] Speaker Segmentation System Using Eigenvoice-based Speaker Weight Distance Method
    Choi, Mu Yeol
    Kim, Hyung Soon
    JOURNAL OF THE ACOUSTICAL SOCIETY OF KOREA, 2012, 31 (04): : 266 - 272
  • [22] Analysis of Speaker Diarization Based on Bayesian HMM With Eigenvoice Priors
    Diez, Mireia
    Burget, Lukas
    Landini, Federico
    Cernocky, Jan
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2020, 28 (28) : 355 - 368
  • [23] ENHANCEMENTS OF MAXIMUM LIKELIHOOD EIGEN-DECOMPOSITION USING FUZZY LOGIC CONTROL FOR EIGENVOICE-BASED SPEAKER ADAPTATION
    Ding, Ing-Jr
    INTERNATIONAL JOURNAL OF INNOVATIVE COMPUTING INFORMATION AND CONTROL, 2011, 7 (7B): : 4207 - 4222
  • [24] Eigenvoice Speaker Adaptation with Minimal Data for Statistical Speech Synthesis Systems Using a MAP Approach and Nearest-Neighbors
    Mohammadi, Amir
    Sarfjoo, Seyyed Saeed
    Demiroglu, Cenk
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2014, 22 (12) : 2146 - 2157
  • [25] Robust Speaker Recognition System Employing Covariance Matrix and Eigenvoice
    Sapijaszko, Genevieve I.
    Mikhael, Wasfy B.
    2013 IEEE 56TH INTERNATIONAL MIDWEST SYMPOSIUM ON CIRCUITS AND SYSTEMS (MWSCAS), 2013, : 1116 - 1119
  • [26] Speech separation using speaker-adapted eigenvoice speech models
    Weiss, Ron J.
    Ellis, Daniel P. W.
    COMPUTER SPEECH AND LANGUAGE, 2010, 24 (01): : 16 - 29
  • [27] Text-Independent Voice Conversion Based on Kernel Eigenvoice
    Li, Yanping
    Zhang, Linghua
    Ding, Hui
    ARTIFICIAL INTELLIGENCE AND COMPUTATIONAL INTELLIGENCE, PT I, 2010, 6319 : 432 - +
  • [28] Cross Likelihood Ratio Based Speaker Clustering Using Eigenvoice Models
    Wang, D.
    Vogt, R.
    Sridharan, S.
    Dean, D.
    12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 964 - 967
  • [29] A non-linear speaker adaptation technique using kernel ridge regression
    Saon, George
    2006 IEEE International Conference on Acoustics, Speech and Signal Processing, Vols 1-13, 2006, : 225 - 228
  • [30] Eigenvoice modelling for cross likelihood ratio based speaker clustering: A Bayesian approach
    Wang, David
    Vogt, Robert
    Sridharan, Sridha
    COMPUTER SPEECH AND LANGUAGE, 2013, 27 (04): : 1011 - 1027