Convolutional density estimation in hidden Markov models for speech recognition

被引：3

作者：

Matsoukas, S ^{[1
]}

Zavaliagkos, G ^{[1
]}

机构：

[1] BBN Syst & Technol Corp, GTE Internetworking, Cambridge, MA 02138 USA

来源：

ICASSP '99: 1999 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, PROCEEDINGS VOLS I-VI | 1999年

关键词：

D O I：

10.1109/ICASSP.1999.758075

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

In continuous density Hidden Markov Models (HMMs) for speech recognition, the probability density function (pdf) for each state is usually expressed as a mixture of Gaussians. In this paper, we present a model in which the pdf is expressed as the convolution of two densities. We focus on the special case where one of the convolved densities is a M-Gaussian mixture, and the other is a mixture of N impulses. We present the reestimation formulae for the parameters of the M x N convolutional model, and suggest two ways for initializing them, the residual K-Means approach, and the deconvolution from a standard HMM with MN Gaussians per state using a genetic algorithm to search for the optimal assignment of Gaussians. Both methods result in a compact representation that requires only O(M + N) storage space for the model parameters, and O(MN) time for training and decoding. We explain how the decoding time can be reduced to O(M + kN), where k < M. Finally results are shown on the 1996 Hub-rf Development test, demonstrating that a 32 x 2 convolutional model can achieve performance comparable to that of a standard 64-Gaussian per state model.

引用

页码：113 / 116

页数：4

共 50 条

[41] Visual speech recognition using Active Shape Models and Hidden Markov Models
Luettin, J
Thacker, NA
Beet, SW
[J]. 1996 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, CONFERENCE PROCEEDINGS, VOLS 1-6, 1996, : 817 - 820
[42] Robust speech recognition using maximum likelihood neural networks and continuous density Hidden Markov Models
Yuk, DS
Che, CW
Flanagan, J
[J]. 1997 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING, PROCEEDINGS, 1997, : 474 - 481
[43] Telephone speech recognition using neural networks and hidden Markov models
Yuk, D
Flanagan, J
[J]. ICASSP '99: 1999 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, PROCEEDINGS VOLS I-VI, 1999, : 157 - 160
[44] Integration of Auxiliary Features in Hidden Markov Models for Arabic Speech Recognition
Amrous, Anissa Imen
Debyeche, Mohamed
Amrouche, A.
[J]. 2009 3RD INTERNATIONAL CONFERENCE ON SIGNALS, CIRCUITS AND SYSTEMS (SCS 2009), 2009, : 612 - 616
[45] STRANDED GAUSSIAN MIXTURE HIDDEN MARKOV MODELS FOR ROBUST SPEECH RECOGNITION
Zhao, Yong
Juang, Biing-Hwang
[J]. 2012 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2012, : 4301 - 4304
[46] Fuzzy Hidden Markov Models for Speech Recognition on based FEM Algorithm
Taheri, Asghar
Tarihi, Mohammad Reza
Baghgar, Hassan
Abad, Bostan
Bababeyk, Hassan
[J]. PROCEEDINGS OF WORLD ACADEMY OF SCIENCE, ENGINEERING AND TECHNOLOGY, VOL 4, 2005, 4 : 59 - 61
[47] Visual speech recognition using motion features and hidden Markov models
Yau, Wai Chee
Kumar, Dinesh Kant
Weghorn, Hans
[J]. COMPUTER ANALYSIS OF IMAGES AND PATTERNS, PROCEEDINGS, 2007, 4673 : 832 - 839
[48] Stereophonic speech recognition in noise using compensated hidden Markov models
Brookes, DM
Leung, MH
[J]. ELECTRONICS LETTERS, 1998, 34 (19) : 1827 - 1829
[49] Combination of hidden Markov models with dynamic time warping for speech recognition
Axelrod, S
Maison, B
[J]. 2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS: SPEECH PROCESSING, 2004, : 173 - 176
[50] Combination of vector quantization and Hidden Markov Models for Arabic speech recognition
Bahi, H
Sellami, M
[J]. ACS/IEEE INTERNATIONAL CONFERENCE ON COMPUTER SYSTEMS AND APPLICATIONS, PROCEEDINGS, 2001, : 96 - 100

← 1 2 3 4 5 →