Rapid discriminative acoustic model based on eigenspace mapping for fast speaker adaptation

被引：17

作者：

Zhou, BW ^{[1
]}

Hansen, JHL ^{[1
]}

机构：

[1] IBM Corp, Yorktown Hts, NY 10598 USA

来源：

IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING | 2005年 / 13卷 / 04期

基金：

美国国家科学基金会;

关键词：

discriminative acoustic model; eigenspace mapping; hidden Markov models; rapid speaker adaptation; speech recognition;

D O I：

10.1109/TSA.2005.845808

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

It is widely believed that strong correlations exist across an utterance as a consequence of time-invariant characteristics of speaker and acoustic environments. It is verified in this paper that the first primary eigendirections of the utterance covariance matrix are speaker dependent. Based on this observation, a novel family of fast speaker adaptation algorithms entitled Eigenspace Mapping (EigMap) is proposed. The proposed algorithms are applied to continuous density Hidden Markov Model (HMM) based speech recognition. The EigMap algorithm rapidly constructs discriminative acoustic models in the test speaker's eigenspace by preserving discriminative information learned from baseline models in the directions of the test speaker's eigenspace. Moreover, the adapted models are compressed by discarding model parameters that are assumed to contain no discrimination information. The core idea of EigMap can be extended in many ways, and a family of algorithms based on EigMap is described in this paper. Unsupervised adaptation experiments show that EigMap is effective in improving baseline models using very limited amounts of adaptation data with superior performance to conventional adaptation techniques such as MLLR and block diagonal MLLR. A relative improvement of 18.4% over a baseline recognizer is achieved using EigMap with only about 4.5 s of adaptation data. Furthermore, it is also demonstrated that EigMap is additive to MLLR by encompassing important speaker dependent discriminative information. A significant relative improvement of 24.6% over baseline is observed using 4.5 s of adaptation data by combining MLLR and EigMap techniques.

引用

页码：554 / 564

页数：11

共 50 条

[21] A novel method for rapid speaker adaptation based on support speaker weighting
Cai, T
Zhu, J
2005 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1-5: SPEECH PROCESSING, 2005, : 993 - 996
[22] A Fast Speaker Adaptation Method using Aspect Model
Hahm, Seongjun
Ito, Akinori
Makino, Shozo
Suzuki, Motoyuki
INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 1221 - 1224
[23] MULTIMODAL SPEAKER ADAPTATION OF ACOUSTIC MODEL AND LANGUAGE MODEL FOR ASR USING SPEAKER FACE EMBEDDING
Moriya, Yasufumi
Jones, Gareth J. F.
2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 8643 - 8647
[24] Discriminative adaptation based on fast combination of DMAP and DfMLLR
Machlica, Lukas
Zajic, Zbynek
Mueller, Ludek
11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 1-2, 2010, : 534 - 537
[25] Correctness-Adjusted Unsupervised Discriminative Acoustic Model Adaptation
Gibson, Matthew
Hain, Thomas
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2012, 20 (10): : 2648 - 2656
[26] MULTI-TASK DEEP NEURAL NETWORK ACOUSTIC MODELS WITH MODEL ADAPTATION USING DISCRIMINATIVE SPEAKER IDENTITY FOR WHISPER RECOGNITION
Li, Jingjie
McLoughlin, Ian
Liu, Cong
Xue, Shaofei
Wei, Si
2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 4969 - 4973
[27] Eigenvoice based fast speaker adaptation with bias compensation
Park, JS
Song, HJ
Kim, HS
KORUS 2003: 7TH KOREA-RUSSIA INTERNATIONAL SYMPOSIUM ON SCIENCE AND TECHNOLOGY, VOL 2, PROCEEDINGS: ELECTRICAL ENGINEERING AND INFORMATION TECHNOLOGY, 2003, : 108 - 112
[28] Online Speaker Adaptation of an Acoustic Model Using Face Recognition
Campr, Pavel
Prazak, Ales
Psutka, Josef V.
Psutka, Josef
TEXT, SPEECH, AND DIALOGUE, TSD 2013, 2013, 8082 : 378 - 385
[29] SVM based speaker selection using GMM supervector for rapid speaker adaptation
Wang, Jian
Lei, Jianjun
Guo, Jun
Yang, Zhen
SIMULATED EVOLUTION AND LEARNING, PROCEEDINGS, 2006, 4247 : 617 - 624
[30] APPLICATION OF SVM-BASED CORRECTNESS PREDICTIONS TO UNSUPERVISED DISCRIMINATIVE SPEAKER ADAPTATION
Gibson, Matthew
Hain, Thomas
2012 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2012, : 4341 - 4344

← 1 2 3 4 5 →