Generative factor analyzed HMM for automatic speech recognition

被引：4

作者：

Yao, KS

Paliwal, KK

Lee, TW

机构：

[1] Univ Calif San Diego, Inst Neural Computat, La Jolla, CA 92093 USA

[2] Griffith Univ, Sch Microelect Engn, Brisbane, Qld 4111, Australia

来源：

SPEECH COMMUNICATION | 2005年 / 45卷 / 04期

关键词：

hidden Markov models; factor analysis; mixture of Gaussian; speech recognition; expectation maximization algorithm;

D O I：

10.1016/j.specom.2005.01.002

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

We present a generative factor analyzed hidden Markov model (GFA-HMM) for automatic speech recognition. In a standard HMM, observation vectors are represented by mixture of Gaussians (MoG) that are dependent on discrete-valued hidden state sequence. The GFA-HMM introduces a hierarchy of continuous-valued latent representation of observation vectors, where latent vectors in one level are acoustic-unit dependent and latent vectors in a higher level are acoustic-unit independent. An expectation maximization (EM) algorithm is derived for maximum likelihood estimation of the model. We show through a set of experiments to verify the potential of the GFA-HMM as an alternative acoustic modeling technique. In one experiment, by varying the latent dimension and the number of mixture components in the latent spaces, the GFA-HMM attained more compact representation than the standard HMM. In other experiments with varies noise types and speaking styles, the GFA-HMM was able to have (statistically significant) improvement with respect to the standard HMM, (c) 2005 Elsevier B.V. All rights reserved.

引用

页码：435 / 454

页数：20

共 50 条

[31] Asynchronous HMM with applications to speech recognition
Garg, A
Balakrishnan, S
Vaithyanathan, S
2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS: SPEECH PROCESSING, 2004, : 1009 - 1012
[32] A Novel Model Characteristics for Noise-Robust Automatic Speech Recognition Based on HMM
Rafieee, M. Saadeq
Khazaei, Ali Akbar
2010 IEEE INTERNATIONAL CONFERENCE ON WIRELESS COMMUNICATIONS, NETWORKING AND INFORMATION SECURITY (WCNIS), VOL 2, 2010, : 215 - 218
[33] On HMM speech recognition based on complex speech analysis
Kinjo, Tatsuhiko
Funaki, Keiichi
IECON 2006 - 32ND ANNUAL CONFERENCE ON IEEE INDUSTRIAL ELECTRONICS, VOLS 1-11, 2006, : 2605 - +
[34] Adaptive HMM Topology for Speech Recognition
Ting, Chuan-Wei
Lee, Kuo-Yuan
Chien, Jen-Tzung
INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 1237 - 1240
[35] Automatic speech segmentation for Chinese speech database based on HMM
Tao, JH
Hain, HU
2002 IEEE REGION 10 CONFERENCE ON COMPUTERS, COMMUNICATIONS, CONTROL AND POWER ENGINEERING, VOLS I-III, PROCEEDINGS, 2002, : 481 - 484
[36] Enhancing Automatic Speech Recognition Quality with a Second-Stage Speech Enhancement Generative Adversarial Network
Nossier, Soha A.
Wall, Julie
Moniri, Mansour
Glackin, Cornelius
Cannings, Nigel
2023 IEEE 35TH INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE, ICTAI, 2023, : 546 - 552
[37] DNN-HMM-based automatic speech recognition system for intelligent LED lighting control
Xian, J. L.
Cai, W. X.
Pan, H. X.
Chen, N. Z.
Chen, X. Y.
Sun, Y. W.
Yan, D.
AUTOMATIC CONTROL, MECHATRONICS AND INDUSTRIAL ENGINEERING, 2019, : 73 - 78
[38] A Comparision of Multiclass SVM and HMM Classifier for Wavelet Front End Robust Automatic Speech Recognition
Rajeswari
Prasad, N. N. S. S. R. K.
Sathyanarayana, V
2013 FOURTH INTERNATIONAL CONFERENCE ON COMPUTING, COMMUNICATIONS AND NETWORKING TECHNOLOGIES (ICCCNT), 2013,
[39] EXPLORING A ZERO-ORDER DIRECT HMM BASED ON LATENT ATTENTION FOR AUTOMATIC SPEECH RECOGNITION
Bahar, Parnia
Makarovi, Nikita
Zeyer, Albert
Schlueter, Ralf
Ney, Hermann
2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 7854 - 7858
[40] Towards knowledge-based features for HMM based large vocabulary automatic speech recognition
Launay, B
Siohan, O
Surendran, A
Lee, CH
2002 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-IV, PROCEEDINGS, 2002, : 817 - 820

← 1 2 3 4 5 →