Kernel eigenvoice speaker adaptation

被引:33
|
作者
Mak, B [1 ]
Kwok, JT [1 ]
Ho, S [1 ]
机构
[1] Hong Kong Univ Sci & Technol, Dept Comp Sci, Hong Kong, Hong Kong, Peoples R China
来源
关键词
composite kernels; eigenvoice speaker adaptation; generalized EM algorithm; kernel eigenvoice speaker adaptation; kernel principal component analysis;
D O I
10.1109/TSA.2005.851971
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Eigenvoice-based methods have been shown to be effective for fast speaker adaptation when only a small amount of adaptation data, say, less than 10 s, is available. At the heart of the method is principal component analysis (PCA) employed to find the most important eigenvoices. In this paper, we postulate that nonlinear PCA using kernel methods may be even more effective. The eigenvoices thus derived will be called kernel eigenvoices (KEV), and we will call our new adaptation method kernel eigenvoice speaker adaptation. However, unlike the standard eigenvoice (EV) method, an adapted speaker model found by the kernel eigenvoice method resides in the high-dimensional kernel-induced feature space, which, in general, cannot be mapped back to an exact preimage in the input speaker supervector space. Consequently, it is not clear how to obtain the constituent Gaussians of the adapted model that are needed for the computation of state observation likelihoods during the estimation of eigenvoice weights and subsequent decoding. Our solution is the use of composite kernels in such a way that state observation likelihoods can be computed using only kernel functions without the need of a speaker-adapted model in the input supervector space. In this paper, we investigate two different composite kernels for KEV adaptation: direct sum kernel and tensor product kernel. In an evaluation on the TIDIGITS task, it is found that KEV speaker adaptation using both forms of coma posite Gaussian kernels are equally effective, and they outperform a speaker-independent model and adapted models found by EV, MAP, or MLLR adaptation using 2.1 and 4.1 s of speech. For example, with 2.1 s of adaptation data, KEV adaptation outperforms the speaker-independent model by 27.5%, whereas EV, MAP, or MLLR adaptation are not effective at all.
引用
收藏
页码:984 / 992
页数:9
相关论文
共 50 条
  • [31] Speaker adaptation
    不详
    ROBUST ADAPTATION TO NON-NATIVE ACCENTS IN AUTOMATIC SPEECH RECOGNITION, 2002, 2560 : 37 - 56
  • [32] Hypothesis-driven adaptation (hydra): A flexible eigenvoice architecture
    Peters, SD
    2001 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-VI, PROCEEDINGS: VOL I: SPEECH PROCESSING 1; VOL II: SPEECH PROCESSING 2 IND TECHNOL TRACK DESIGN & IMPLEMENTATION OF SIGNAL PROCESSING SYSTEMS NEURALNETWORKS FOR SIGNAL PROCESSING; VOL III: IMAGE & MULTIDIMENSIONAL SIGNAL PROCESSING MULTIMEDIA SIGNAL PROCESSING, 2001, : 349 - 352
  • [33] Maximum A Posteriori Adaptation for Many-to-One Eigenvoice Conversion
    Tani, Daisuke
    Toda, Tomoki
    Ohtani, Yamato
    Saruwatari, Hiroshi
    Shikano, Kiyohiro
    INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 1461 - 1464
  • [34] Very fast adaptation with a compact context-dependent eigenvoice model
    Kuhn, R
    Perronnin, F
    Nguyen, P
    Junqua, JC
    Rigazio, L
    2001 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-VI, PROCEEDINGS: VOL I: SPEECH PROCESSING 1; VOL II: SPEECH PROCESSING 2 IND TECHNOL TRACK DESIGN & IMPLEMENTATION OF SIGNAL PROCESSING SYSTEMS NEURALNETWORKS FOR SIGNAL PROCESSING; VOL III: IMAGE & MULTIDIMENSIONAL SIGNAL PROCESSING MULTIMEDIA SIGNAL PROCESSING, 2001, : 373 - 376
  • [35] Clustering speech utterances by speaker using eigenvoice-motivated vector space models
    Tsai, WH
    Cheng, SS
    Chao, YH
    Wang, HM
    2005 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1-5: SPEECH PROCESSING, 2005, : 725 - 728
  • [36] Speaker adaptation: An overview
    Zheng, Rong
    Wang, Zuoying
    Chinese Journal of Electronics, 1998, 7 (02): : 122 - 127
  • [37] Speaker adaptation through speaker specific compensation
    Laxman, S
    Sastry, PS
    2004 INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING & COMMUNICATIONS (SPCOM), 2004, : 81 - 85
  • [38] ADVERSARIAL SPEAKER ADAPTATION
    Meng, Zhong
    Li, Jinyu
    Gong, Yifan
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 5721 - 5725
  • [39] Speaker Adaptive Training for One-to-Many Eigenvoice Conversion Based on Gaussian Mixture Model
    Ohtani, Yamato
    Toda, Tomoki
    Saruwatari, Hiroshi
    Shikano, Kiyohiro
    INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 2496 - 2499
  • [40] Eigenspace estimation with missing values and its application to eigenvoice adaptation for speech recognition
    Ou, Zhijian
    Luo, Jun
    2008 INTERNATIONAL CONFERENCE ON AUDIO, LANGUAGE AND IMAGE PROCESSING, VOLS 1 AND 2, PROCEEDINGS, 2008, : 1214 - 1218