Rapid discriminative acoustic model based on eigenspace mapping for fast speaker adaptation

被引:17
|
作者
Zhou, BW [1 ]
Hansen, JHL [1 ]
机构
[1] IBM Corp, Yorktown Hts, NY 10598 USA
来源
基金
美国国家科学基金会;
关键词
discriminative acoustic model; eigenspace mapping; hidden Markov models; rapid speaker adaptation; speech recognition;
D O I
10.1109/TSA.2005.845808
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
It is widely believed that strong correlations exist across an utterance as a consequence of time-invariant characteristics of speaker and acoustic environments. It is verified in this paper that the first primary eigendirections of the utterance covariance matrix are speaker dependent. Based on this observation, a novel family of fast speaker adaptation algorithms entitled Eigenspace Mapping (EigMap) is proposed. The proposed algorithms are applied to continuous density Hidden Markov Model (HMM) based speech recognition. The EigMap algorithm rapidly constructs discriminative acoustic models in the test speaker's eigenspace by preserving discriminative information learned from baseline models in the directions of the test speaker's eigenspace. Moreover, the adapted models are compressed by discarding model parameters that are assumed to contain no discrimination information. The core idea of EigMap can be extended in many ways, and a family of algorithms based on EigMap is described in this paper. Unsupervised adaptation experiments show that EigMap is effective in improving baseline models using very limited amounts of adaptation data with superior performance to conventional adaptation techniques such as MLLR and block diagonal MLLR. A relative improvement of 18.4% over a baseline recognizer is achieved using EigMap with only about 4.5 s of adaptation data. Furthermore, it is also demonstrated that EigMap is additive to MLLR by encompassing important speaker dependent discriminative information. A significant relative improvement of 24.6% over baseline is observed using 4.5 s of adaptation data by combining MLLR and EigMap techniques.
引用
收藏
页码:554 / 564
页数:11
相关论文
共 50 条
  • [21] A novel method for rapid speaker adaptation based on support speaker weighting
    Cai, T
    Zhu, J
    2005 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1-5: SPEECH PROCESSING, 2005, : 993 - 996
  • [22] A Fast Speaker Adaptation Method using Aspect Model
    Hahm, Seongjun
    Ito, Akinori
    Makino, Shozo
    Suzuki, Motoyuki
    INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 1221 - 1224
  • [23] MULTIMODAL SPEAKER ADAPTATION OF ACOUSTIC MODEL AND LANGUAGE MODEL FOR ASR USING SPEAKER FACE EMBEDDING
    Moriya, Yasufumi
    Jones, Gareth J. F.
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 8643 - 8647
  • [24] Discriminative adaptation based on fast combination of DMAP and DfMLLR
    Machlica, Lukas
    Zajic, Zbynek
    Mueller, Ludek
    11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 1-2, 2010, : 534 - 537
  • [25] Correctness-Adjusted Unsupervised Discriminative Acoustic Model Adaptation
    Gibson, Matthew
    Hain, Thomas
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2012, 20 (10): : 2648 - 2656
  • [26] MULTI-TASK DEEP NEURAL NETWORK ACOUSTIC MODELS WITH MODEL ADAPTATION USING DISCRIMINATIVE SPEAKER IDENTITY FOR WHISPER RECOGNITION
    Li, Jingjie
    McLoughlin, Ian
    Liu, Cong
    Xue, Shaofei
    Wei, Si
    2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 4969 - 4973
  • [27] Eigenvoice based fast speaker adaptation with bias compensation
    Park, JS
    Song, HJ
    Kim, HS
    KORUS 2003: 7TH KOREA-RUSSIA INTERNATIONAL SYMPOSIUM ON SCIENCE AND TECHNOLOGY, VOL 2, PROCEEDINGS: ELECTRICAL ENGINEERING AND INFORMATION TECHNOLOGY, 2003, : 108 - 112
  • [28] Online Speaker Adaptation of an Acoustic Model Using Face Recognition
    Campr, Pavel
    Prazak, Ales
    Psutka, Josef V.
    Psutka, Josef
    TEXT, SPEECH, AND DIALOGUE, TSD 2013, 2013, 8082 : 378 - 385
  • [29] SVM based speaker selection using GMM supervector for rapid speaker adaptation
    Wang, Jian
    Lei, Jianjun
    Guo, Jun
    Yang, Zhen
    SIMULATED EVOLUTION AND LEARNING, PROCEEDINGS, 2006, 4247 : 617 - 624
  • [30] APPLICATION OF SVM-BASED CORRECTNESS PREDICTIONS TO UNSUPERVISED DISCRIMINATIVE SPEAKER ADAPTATION
    Gibson, Matthew
    Hain, Thomas
    2012 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2012, : 4341 - 4344