Generative factor analyzed HMM for automatic speech recognition

被引:4
|
作者
Yao, KS
Paliwal, KK
Lee, TW
机构
[1] Univ Calif San Diego, Inst Neural Computat, La Jolla, CA 92093 USA
[2] Griffith Univ, Sch Microelect Engn, Brisbane, Qld 4111, Australia
关键词
hidden Markov models; factor analysis; mixture of Gaussian; speech recognition; expectation maximization algorithm;
D O I
10.1016/j.specom.2005.01.002
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
We present a generative factor analyzed hidden Markov model (GFA-HMM) for automatic speech recognition. In a standard HMM, observation vectors are represented by mixture of Gaussians (MoG) that are dependent on discrete-valued hidden state sequence. The GFA-HMM introduces a hierarchy of continuous-valued latent representation of observation vectors, where latent vectors in one level are acoustic-unit dependent and latent vectors in a higher level are acoustic-unit independent. An expectation maximization (EM) algorithm is derived for maximum likelihood estimation of the model. We show through a set of experiments to verify the potential of the GFA-HMM as an alternative acoustic modeling technique. In one experiment, by varying the latent dimension and the number of mixture components in the latent spaces, the GFA-HMM attained more compact representation than the standard HMM. In other experiments with varies noise types and speaking styles, the GFA-HMM was able to have (statistically significant) improvement with respect to the standard HMM, (c) 2005 Elsevier B.V. All rights reserved.
引用
收藏
页码:435 / 454
页数:20
相关论文
共 50 条
  • [21] Simultaneous Optimization of Multiple Tree Structures for Factor Analyzed HMM-Based Speech Synthesis
    Yoshimura, Takenori
    Hashimoto, Kei
    Nankaku, Yoshihiko
    Tokuda, Keiichi
    16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 1196 - 1200
  • [22] Discriminant initialization for factor analyzed HMM training
    Lefevre, Fabrice
    Gauvain, Jean-Luc
    2006 IEEE International Conference on Acoustics, Speech and Signal Processing, Vols 1-13, 2006, : 285 - 288
  • [23] An improved HMM speech recognition model
    Yuan, Lichi
    2008 INTERNATIONAL CONFERENCE ON AUDIO, LANGUAGE AND IMAGE PROCESSING, VOLS 1 AND 2, PROCEEDINGS, 2008, : 1311 - 1315
  • [24] HMM speech recognition with reduced training
    Foo, SW
    Yap, T
    ICICS - PROCEEDINGS OF 1997 INTERNATIONAL CONFERENCE ON INFORMATION, COMMUNICATIONS AND SIGNAL PROCESSING, VOLS 1-3: THEME: TRENDS IN INFORMATION SYSTEMS ENGINEERING AND WIRELESS MULTIMEDIA COMMUNICATIONS, 1997, : 1016 - 1019
  • [25] Non-parametric probability estimation for HMM-based automatic speech recognition
    Lefèvre, F
    COMPUTER SPEECH AND LANGUAGE, 2003, 17 (2-3): : 113 - 136
  • [26] HMM-based automatic speech commands and instructions recognition system for Polish language
    Wydra, S
    PHOTONICS APPLICATIONS IN ASTRONOMY, COMMUNICATIONS, INDUSTRY, AND HIGH-ENERGY PHYSICS EXPERIMENTS IV, 2006, 6159
  • [27] Automatic Speech Recognition for Connected Words using DTW/HMM for English/Hindi Languages
    Singhal, Shweta
    Dubey, Rajesh Kumar
    2015 COMMUNICATION, CONTROL AND INTELLIGENT SYSTEMS (CCIS), 2015, : 199 - 203
  • [28] Data Balancing for Efficient Training of Hybrid ANN/HMM Automatic Speech Recognition Systems
    Garcia-Moral, Ana Isabel
    Solera-Urena, Ruben
    Pelaez-Moreno, Carmen
    Daiz-de-Maria, Fernando
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2011, 19 (03): : 468 - 481
  • [29] Inhomogeneous HMM speech recognition algorithm
    Tsinghua Univ, Beijing, China
    Chin J Electron, 1 (73-77):
  • [30] Incorporating the voicing information into HMM-based automatic speech recognition in noisy environments
    Jancovic, Peter
    Koekueer, Muenevver
    SPEECH COMMUNICATION, 2009, 51 (05) : 438 - 451