Rapid discriminative acoustic model based on eigenspace mapping for fast speaker adaptation

被引:17
|
作者
Zhou, BW [1 ]
Hansen, JHL [1 ]
机构
[1] IBM Corp, Yorktown Hts, NY 10598 USA
来源
基金
美国国家科学基金会;
关键词
discriminative acoustic model; eigenspace mapping; hidden Markov models; rapid speaker adaptation; speech recognition;
D O I
10.1109/TSA.2005.845808
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
It is widely believed that strong correlations exist across an utterance as a consequence of time-invariant characteristics of speaker and acoustic environments. It is verified in this paper that the first primary eigendirections of the utterance covariance matrix are speaker dependent. Based on this observation, a novel family of fast speaker adaptation algorithms entitled Eigenspace Mapping (EigMap) is proposed. The proposed algorithms are applied to continuous density Hidden Markov Model (HMM) based speech recognition. The EigMap algorithm rapidly constructs discriminative acoustic models in the test speaker's eigenspace by preserving discriminative information learned from baseline models in the directions of the test speaker's eigenspace. Moreover, the adapted models are compressed by discarding model parameters that are assumed to contain no discrimination information. The core idea of EigMap can be extended in many ways, and a family of algorithms based on EigMap is described in this paper. Unsupervised adaptation experiments show that EigMap is effective in improving baseline models using very limited amounts of adaptation data with superior performance to conventional adaptation techniques such as MLLR and block diagonal MLLR. A relative improvement of 18.4% over a baseline recognizer is achieved using EigMap with only about 4.5 s of adaptation data. Furthermore, it is also demonstrated that EigMap is additive to MLLR by encompassing important speaker dependent discriminative information. A significant relative improvement of 24.6% over baseline is observed using 4.5 s of adaptation data by combining MLLR and EigMap techniques.
引用
收藏
页码:554 / 564
页数:11
相关论文
共 50 条
  • [1] Discriminative acoustic model using eigenspace mapping for rapid speaker adaptation
    Zhou, BW
    Hansen, JHL
    2003 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS: SPEECH PROCESSING I, 2003, : 308 - 311
  • [2] Eigenspace-based maximum a posteriori linear regression for rapid speaker adaptation
    Chen, KT
    Wang, HM
    2001 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-VI, PROCEEDINGS: VOL I: SPEECH PROCESSING 1; VOL II: SPEECH PROCESSING 2 IND TECHNOL TRACK DESIGN & IMPLEMENTATION OF SIGNAL PROCESSING SYSTEMS NEURALNETWORKS FOR SIGNAL PROCESSING; VOL III: IMAGE & MULTIDIMENSIONAL SIGNAL PROCESSING MULTIMEDIA SIGNAL PROCESSING, 2001, : 317 - 320
  • [3] Rapid speaker adaptation using multi-stream Structural Maximum Likelihood Eigenspace Mapping
    Zhou, BW
    Hansen, J
    2002 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-IV, PROCEEDINGS, 2002, : 4166 - 4166
  • [4] FAST SPEAKER ADAPTATION OF HYBRID NN/HMM MODEL FOR SPEECH RECOGNITION BASED ON DISCRIMINATIVE LEARNING OF SPEAKER CODE
    Abdel-Hamid, Ossama
    Jiang, Hui
    2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 7942 - 7946
  • [5] CONSTRAINED DISCRIMINATIVE MAPPING TRANSFORMS FOR UNSUPERVISED SPEAKER ADAPTATION
    Chen, Langzhou
    Gales, Mark J. F.
    Chin, K. K.
    2011 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2011, : 5344 - 5347
  • [6] INCREASING DISCRIMINATIVE CAPABILITY ON MAP-BASED MAPPING FUNCTION ESTIMATION FOR ACOUSTIC MODEL ADAPTATION
    Tsao, Yu
    Isotani, Ryosuke
    Kawai, Hisashi
    Nakamura, Satoshi
    2011 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2011, : 5320 - 5323
  • [7] Rapid speaker adaptation using a priori knowledge by eigenspace analysis of MLLR parameters
    Wang, NJC
    Lee, SSM
    Seide, F
    Lee, LS
    2001 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-VI, PROCEEDINGS: VOL I: SPEECH PROCESSING 1; VOL II: SPEECH PROCESSING 2 IND TECHNOL TRACK DESIGN & IMPLEMENTATION OF SIGNAL PROCESSING SYSTEMS NEURALNETWORKS FOR SIGNAL PROCESSING; VOL III: IMAGE & MULTIDIMENSIONAL SIGNAL PROCESSING MULTIMEDIA SIGNAL PROCESSING - VOL IV: SIGNAL PROCESSING FOR COMMUNICATIONS; VOL V: SIGNAL PROCESSING EDUCATION SENSOR ARRAY & MULTICHANNEL SIGNAL PROCESSING AUDIO & ELECTROACOUSTICS; VOL VI: SIGNAL PROCESSING THEORY & METHODS STUDENT FORUM, 2001, : 345 - 348
  • [8] Discriminative map for acoustic model adaptation
    Povey, D
    Woodland, PC
    Gales, MJF
    2003 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS: SPEECH PROCESSING I, 2003, : 312 - 315
  • [9] Empirical Evaluation of Speaker Adaptation on DNN based Acoustic Model
    Wang, Ke
    Zhang, Junbo
    Wang, Yujun
    Xie, Lei
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 2429 - 2433
  • [10] Discriminative MCE-Based Speaker Adaptation of Acoustic Models for a Spoken Lecture Processing Task
    Hazen, Timothy J.
    McDermott, Erik
    INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 2017 - +