Rapid discriminative acoustic model based on eigenspace mapping for fast speaker adaptation

被引:17
|
作者
Zhou, BW [1 ]
Hansen, JHL [1 ]
机构
[1] IBM Corp, Yorktown Hts, NY 10598 USA
来源
基金
美国国家科学基金会;
关键词
discriminative acoustic model; eigenspace mapping; hidden Markov models; rapid speaker adaptation; speech recognition;
D O I
10.1109/TSA.2005.845808
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
It is widely believed that strong correlations exist across an utterance as a consequence of time-invariant characteristics of speaker and acoustic environments. It is verified in this paper that the first primary eigendirections of the utterance covariance matrix are speaker dependent. Based on this observation, a novel family of fast speaker adaptation algorithms entitled Eigenspace Mapping (EigMap) is proposed. The proposed algorithms are applied to continuous density Hidden Markov Model (HMM) based speech recognition. The EigMap algorithm rapidly constructs discriminative acoustic models in the test speaker's eigenspace by preserving discriminative information learned from baseline models in the directions of the test speaker's eigenspace. Moreover, the adapted models are compressed by discarding model parameters that are assumed to contain no discrimination information. The core idea of EigMap can be extended in many ways, and a family of algorithms based on EigMap is described in this paper. Unsupervised adaptation experiments show that EigMap is effective in improving baseline models using very limited amounts of adaptation data with superior performance to conventional adaptation techniques such as MLLR and block diagonal MLLR. A relative improvement of 18.4% over a baseline recognizer is achieved using EigMap with only about 4.5 s of adaptation data. Furthermore, it is also demonstrated that EigMap is additive to MLLR by encompassing important speaker dependent discriminative information. A significant relative improvement of 24.6% over baseline is observed using 4.5 s of adaptation data by combining MLLR and EigMap techniques.
引用
收藏
页码:554 / 564
页数:11
相关论文
共 50 条
  • [41] Discriminative likelihood score weighting based on acoustic-phonetic classification for speaker identification
    Suh, Youngjoo
    Kim, Hoirin
    EURASIP JOURNAL ON ADVANCES IN SIGNAL PROCESSING, 2014, : 1 - 7
  • [42] LEARNING TASK-DEPENDENT SPEECH VARIABILITY IN DISCRIMINATIVE ACOUSTIC MODEL ADAPTATION
    Sato, Shoei
    Oku, Takahiro
    Homma, Shinichi
    Kobayashi, Akio
    Imai, Toru
    2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, : 4910 - 4913
  • [43] Discriminative training for speaker identification based on maximum model distance algorithm
    Hong, QY
    Kwong, S
    2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS: SPEECH PROCESSING, 2004, : 25 - 28
  • [44] Speaker indexing and adaptation using speaker clustering based on statistical model selection
    Nishida, M
    Kawahara, T
    2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS: SPEECH PROCESSING, 2004, : 353 - 356
  • [45] UBM based speaker selection and model re-estimation for speaker adaptation
    Wang, Jian
    Guo, Jun
    Liu, Gang
    Lei, Jianjun
    PROCEEDINGS OF THE FIFTH IEEE INTERNATIONAL CONFERENCE ON COGNITIVE INFORMATICS, VOLS 1 AND 2, 2006, : 856 - 860
  • [46] Acoustic model enhancement: An adaptation technique for speaker verification under noisy environments
    Moreno-Daniel, A.
    Nolazco-Flores, J. A.
    Wada, T.
    Juang, B. -H.
    2007 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL IV, PTS 1-3, 2007, : 289 - +
  • [47] Improving rapid unsupervised speaker adaptation based on hmm sufficient statistics
    Gomez, Randy
    Toda, Tomoki
    Saruwatari, Hiroshi
    Shikano, Kiyohiro
    2006 IEEE International Conference on Acoustics, Speech and Signal Processing, Vols 1-13, 2006, : 1001 - 1004
  • [48] A SAMPLING-BASED ENVIRONMENT POPULATION PROJECTION APPROACH FOR RAPID ACOUSTIC MODEL ADAPTATION
    Tsao, Yu
    Matsuda, Shigeki
    Sakai, Shinsuke
    Isotani, Ryosuke
    Kawai, Hisashi
    Nakamura, Satoshi
    2011 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2011, : 5504 - 5507
  • [49] Speaker adaptation method for acoustic-to-articulatory inversion using an HMM-based speech production model
    Hiroya, S
    Honda, M
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2004, E87D (05): : 1071 - 1078
  • [50] Unsupervised Lattice-based Acoustic Model Adaptation for Speaker-Dependent Conversational Telephone Speech Transcription
    Thambiratnam, K.
    Seide, E.
    INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 1567 - 1570