Factor Analyzed HMM Topology for Speech Recognition

被引:0
|
作者
Ting, Chuan-Wei [1 ]
Chien, Jen-Tzung [1 ]
机构
[1] Natl Cheng Kung Univ, Dept Comp Sci & Informat Engn, Tainan 70101, Taiwan
关键词
factor analysis; similarity measure; HMM topology; speech recognition;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper presents a new factor analyzed (FA) similarity measure between two Gaussian mixture models (GMMs). An adaptive hidden Markov model (HMM) topology is built to compensate the pronunciation variations in speech recognition. Our idea aims to evaluate whether the variation of a HMM state from new speech data is significant or not and judge if a new state should be generated in the models. Due to the effectiveness of FA data analysis, we measure the GMM similarity by estimating the common factors and specific factors embedded in the HMM means and variances. Similar Gaussian densities are represented by the common factors. Specific factors express the residual of similarity measure. We perform a composite hypothesis test due to common factors as well as specific factors. An adaptive HMM topology is accordingly established from continuous collection of training utterances. Experiments show that the proposed FA measure outperforms other measures with comparable size of parameters.
引用
收藏
页码:1407 / 1410
页数:4
相关论文
共 50 条
  • [41] HMM/SVM segmentation and labelling of Arabic speech for speech recognition applications
    Frihia H.
    Bahi H.
    International Journal of Speech Technology, 2017, 20 (3) : 563 - 573
  • [42] Speech recognition based on genetic algorithm for training HMM
    Sun, F
    Hu, GR
    ELECTRONICS LETTERS, 1998, 34 (16) : 1563 - 1564
  • [43] Using word temporal structure in HMM speech recognition
    Fissore, L
    Laface, P
    Ravera, F
    1997 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I - V: VOL I: PLENARY, EXPERT SUMMARIES, SPECIAL, AUDIO, UNDERWATER ACOUSTICS, VLSI; VOL II: SPEECH PROCESSING; VOL III: SPEECH PROCESSING, DIGITAL SIGNAL PROCESSING; VOL IV: MULTIDIMENSIONAL SIGNAL PROCESSING, NEURAL NETWORKS - VOL V: STATISTICAL SIGNAL AND ARRAY PROCESSING, APPLICATIONS, 1997, : 975 - 978
  • [44] A coupled HMM for audio-visual speech recognition
    Nefian, AV
    Liang, LH
    Pi, XB
    Xiaoxiang, L
    Mao, C
    Murphy, K
    2002 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-IV, PROCEEDINGS, 2002, : 2013 - 2016
  • [45] English Speech Recognition System Based on HMM in Matlab
    Yang Xiaocui
    Sun Lihua
    PROCEEDINGS OF 2009 CONFERENCE ON COMMUNICATION FACULTY, 2009, : 337 - 341
  • [46] Peripheral features for HMM-based speech recognition
    Fukuda, T
    Takigawa, M
    Nitta, T
    2001 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-VI, PROCEEDINGS: VOL I: SPEECH PROCESSING 1; VOL II: SPEECH PROCESSING 2 IND TECHNOL TRACK DESIGN & IMPLEMENTATION OF SIGNAL PROCESSING SYSTEMS NEURALNETWORKS FOR SIGNAL PROCESSING; VOL III: IMAGE & MULTIDIMENSIONAL SIGNAL PROCESSING MULTIMEDIA SIGNAL PROCESSING - VOL IV: SIGNAL PROCESSING FOR COMMUNICATIONS; VOL V: SIGNAL PROCESSING EDUCATION SENSOR ARRAY & MULTICHANNEL SIGNAL PROCESSING AUDIO & ELECTROACOUSTICS; VOL VI: SIGNAL PROCESSING THEORY & METHODS STUDENT FORUM, 2001, : 129 - 132
  • [47] A 2D extended HMM for speech recognition
    Li, JY
    Murua, A
    ICASSP '99: 1999 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, PROCEEDINGS VOLS I-VI, 1999, : 349 - 352
  • [48] Using word temporal structure in HMM speech recognition
    Fissore, L.
    Laface, P.
    Ravera, F.
    CSELT Technical Reports, 1997, 25 (03): : 345 - 352
  • [49] An Optimized Model for Visual Speech Recognition Using HMM
    Paramasivam, Sujatha
    Murugesanadar, Radhakrishnan
    INTERNATIONAL ARAB JOURNAL OF INFORMATION TECHNOLOGY, 2018, 15 (02) : 194 - 201
  • [50] Increasing speech recognition robustness with HMM2
    Weber, K
    Bengio, S
    Boulard, H
    2002 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-IV, PROCEEDINGS, 2002, : 929 - 932