Speaker adaptation in the maximum a posteriori framework based on the probabilistic 2-mode analysis of training models

被引：0

作者：

Jeong, Yongwon ^{[1
]}

机构：

[1] Pusan Natl Univ, Sch Elect Engn, Pusan 609735, South Korea

来源：

EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING | 2013年

关键词：

Speech recognition; Speaker adaptation; Probabilistic tensor analysis; Tucker decomposition; HIDDEN MARKOV-MODELS; LIKELIHOOD;

D O I：

10.1186/1687-4722-2013-7

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

In this article, we describe a speaker adaptation method based on the probabilistic 2-mode analysis of training models. Probabilistic 2-mode analysis is a probabilistic extension of multilinear analysis. We apply probabilistic 2-mode analysis to speaker adaptation by representing each of the hidden Markov model mean vectors of training speakers as a matrix, and derive the speaker adaptation equation in the maximum a posteriori (MAP) framework. The adaptation equation becomes similar to the speaker adaptation equation using the MAP linear regression adaptation. In the experiments, the adapted models based on probabilistic 2-mode analysis showed performance improvement over the adapted models based on Tucker decomposition, which is a representative multilinear decomposition technique, for small amounts of adaptation data while maintaining good performance for large amounts of adaptation data.

引用

页数：11

共 21 条

[1] Speaker adaptation in the maximum a posteriori framework based on the probabilistic 2-mode analysis of training models
Yongwon Jeong
[J]. EURASIP Journal on Audio, Speech, and Music Processing, 2013
[2] Unified framework for basis-based speaker adaptation using 2-mode analysis
Jeong, Y.
[J]. ELECTRONICS LETTERS, 2009, 45 (21) : 1096 - 1097
[3] Maximum a posteriori adaptation of HMM parameters based on speaker space projection
Kim, DK
Kim, NS
[J]. SPEECH COMMUNICATION, 2004, 42 (01) : 59 - 73
[4] Eigenspace-based maximum a posteriori linear regression for rapid speaker adaptation
Chen, KT
Wang, HM
[J]. 2001 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-VI, PROCEEDINGS: VOL I: SPEECH PROCESSING 1; VOL II: SPEECH PROCESSING 2 IND TECHNOL TRACK DESIGN & IMPLEMENTATION OF SIGNAL PROCESSING SYSTEMS NEURALNETWORKS FOR SIGNAL PROCESSING; VOL III: IMAGE & MULTIDIMENSIONAL SIGNAL PROCESSING MULTIMEDIA SIGNAL PROCESSING, 2001, : 317 - 320
[5] SPEAKER ADAPTATION BASED ON THE MULTILINEAR DECOMPOSITION OF TRAINING SPEAKER MODELS
Jeong, Yongwon
[J]. 2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, : 4870 - 4873
[6] Robust speaker adaptation based on parallel factor analysis of training models
Jeong, Y.
[J]. ELECTRONICS LETTERS, 2011, 47 (07) : 465 - U68
[7] MINIMUM BAYES RISK TRAINING OF CTC ACOUSTIC MODELS IN MAXIMUM A POSTERIORI BASED DECODING FRAMEWORK
Kanda, Naoyuki
Lu, Xugang
Kawai, Hisashi
[J]. 2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 4855 - 4859
[8] Probabilistic Bilinear Transformation Space-Based Joint Maximum A Posteriori Adaptation
Song, Hwa Jeon
Lee, Yunkeun
Kim, Hyung Soon
[J]. ETRI JOURNAL, 2012, 34 (05) : 783 - 786
[9] Speaker adaptation based on transfer vector field smoothing using maximum a posteriori probability estimation
Tonomura, M
Kosaka, T
Matsunaga, S
[J]. COMPUTER SPEECH AND LANGUAGE, 1996, 10 (02): : 117 - 132
[10] Bilinear Model-Based Maximum Likelihood Linear Regression Speaker Adaptation Framework
Song, Hwa Jeon
Kim, Hyung Soon
[J]. IEEE SIGNAL PROCESSING LETTERS, 2009, 16 (12) : 1063 - 1066

← 1 2 3 →