Generative factor analyzed HMM for automatic speech recognition

被引:4
|
作者
Yao, KS
Paliwal, KK
Lee, TW
机构
[1] Univ Calif San Diego, Inst Neural Computat, La Jolla, CA 92093 USA
[2] Griffith Univ, Sch Microelect Engn, Brisbane, Qld 4111, Australia
关键词
hidden Markov models; factor analysis; mixture of Gaussian; speech recognition; expectation maximization algorithm;
D O I
10.1016/j.specom.2005.01.002
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
We present a generative factor analyzed hidden Markov model (GFA-HMM) for automatic speech recognition. In a standard HMM, observation vectors are represented by mixture of Gaussians (MoG) that are dependent on discrete-valued hidden state sequence. The GFA-HMM introduces a hierarchy of continuous-valued latent representation of observation vectors, where latent vectors in one level are acoustic-unit dependent and latent vectors in a higher level are acoustic-unit independent. An expectation maximization (EM) algorithm is derived for maximum likelihood estimation of the model. We show through a set of experiments to verify the potential of the GFA-HMM as an alternative acoustic modeling technique. In one experiment, by varying the latent dimension and the number of mixture components in the latent spaces, the GFA-HMM attained more compact representation than the standard HMM. In other experiments with varies noise types and speaking styles, the GFA-HMM was able to have (statistically significant) improvement with respect to the standard HMM, (c) 2005 Elsevier B.V. All rights reserved.
引用
收藏
页码:435 / 454
页数:20
相关论文
共 50 条
  • [31] Asynchronous HMM with applications to speech recognition
    Garg, A
    Balakrishnan, S
    Vaithyanathan, S
    2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS: SPEECH PROCESSING, 2004, : 1009 - 1012
  • [32] A Novel Model Characteristics for Noise-Robust Automatic Speech Recognition Based on HMM
    Rafieee, M. Saadeq
    Khazaei, Ali Akbar
    2010 IEEE INTERNATIONAL CONFERENCE ON WIRELESS COMMUNICATIONS, NETWORKING AND INFORMATION SECURITY (WCNIS), VOL 2, 2010, : 215 - 218
  • [33] On HMM speech recognition based on complex speech analysis
    Kinjo, Tatsuhiko
    Funaki, Keiichi
    IECON 2006 - 32ND ANNUAL CONFERENCE ON IEEE INDUSTRIAL ELECTRONICS, VOLS 1-11, 2006, : 2605 - +
  • [34] Adaptive HMM Topology for Speech Recognition
    Ting, Chuan-Wei
    Lee, Kuo-Yuan
    Chien, Jen-Tzung
    INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 1237 - 1240
  • [35] Automatic speech segmentation for Chinese speech database based on HMM
    Tao, JH
    Hain, HU
    2002 IEEE REGION 10 CONFERENCE ON COMPUTERS, COMMUNICATIONS, CONTROL AND POWER ENGINEERING, VOLS I-III, PROCEEDINGS, 2002, : 481 - 484
  • [36] Enhancing Automatic Speech Recognition Quality with a Second-Stage Speech Enhancement Generative Adversarial Network
    Nossier, Soha A.
    Wall, Julie
    Moniri, Mansour
    Glackin, Cornelius
    Cannings, Nigel
    2023 IEEE 35TH INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE, ICTAI, 2023, : 546 - 552
  • [37] DNN-HMM-based automatic speech recognition system for intelligent LED lighting control
    Xian, J. L.
    Cai, W. X.
    Pan, H. X.
    Chen, N. Z.
    Chen, X. Y.
    Sun, Y. W.
    Yan, D.
    AUTOMATIC CONTROL, MECHATRONICS AND INDUSTRIAL ENGINEERING, 2019, : 73 - 78
  • [38] A Comparision of Multiclass SVM and HMM Classifier for Wavelet Front End Robust Automatic Speech Recognition
    Rajeswari
    Prasad, N. N. S. S. R. K.
    Sathyanarayana, V
    2013 FOURTH INTERNATIONAL CONFERENCE ON COMPUTING, COMMUNICATIONS AND NETWORKING TECHNOLOGIES (ICCCNT), 2013,
  • [39] EXPLORING A ZERO-ORDER DIRECT HMM BASED ON LATENT ATTENTION FOR AUTOMATIC SPEECH RECOGNITION
    Bahar, Parnia
    Makarovi, Nikita
    Zeyer, Albert
    Schlueter, Ralf
    Ney, Hermann
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 7854 - 7858
  • [40] Towards knowledge-based features for HMM based large vocabulary automatic speech recognition
    Launay, B
    Siohan, O
    Surendran, A
    Lee, CH
    2002 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-IV, PROCEEDINGS, 2002, : 817 - 820