Comparison and combination of features in a hybrid HMM/MLP and a HMM/GMM speech recognition system

被引:25
|
作者
Pujol, P
Pol, S
Nadeu, C
Hagen, A
Bourlard, H
机构
[1] Univ Politecn Catalunya, Talp Res Ctr, ES-08034 Barcelona, Spain
[2] INESC ID, Spoken Language Syst Lab L2F, P-1000029 Lisbon, Portugal
[3] IDIAP, CH-1920 Martigny, Switzerland
来源
关键词
frequency filtering (FF); multistream; MLP; product rule; relative spectra (Rasta); robustness;
D O I
10.1109/TSA.2004.834466
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Recently, the advantages of the spectral parameters obtained by frequency filtering (FF) of the logarithmic filter-bank energies (logFBEs) have been reported. These parameters, which are frequency derivatives of the logFBEs, lie in the frequency domain, and have shown good recognition performance with respect to the conventional mel-frequency cepstral coefficients (MFCCs) for hidden Markov models (HMM) based systems. In this paper, the FF features are first compared with the MFCCs and the relative spectral perceptual linear prediction (Rasta-PLP) features using both a hybrid HMM/MLP and a usual HMM/Gaussian mixture models (HMM/GMM) based recognition system, for both clean and noisy speech. Taking advantage of the ability of the hybrid system to deal with correlated features, the inclusion of both the frequency second-derivatives and the raw logFBEs as additional features is proposed and tested. Moreover, the robustness of these features in noisy conditions is enhanced by combining the FF technique with the Rasta temporal filtering approach. Finally, a study of the FF features in the framework of multistream processing is presented. The best recognition results for both clean and noisy speech are obtained from the multistream combination-of the J-Rasta-PLP features and the FF features.
引用
收藏
页码:14 / 22
页数:9
相关论文
共 50 条
  • [1] Comparison between two hybrid HMM/MLP approaches in speech recognition
    Fontaine, V
    Ris, C
    Leich, H
    Vantieghem, J
    Accaino, S
    VanCompernolle, D
    [J]. 1996 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, CONFERENCE PROCEEDINGS, VOLS 1-6, 1996, : 3362 - 3365
  • [2] HIERARCHICAL HYBRID MLP/HMM OR RATHER MLP FEATURES FOR A DISCRIMINATIVELY TRAINED GAUSSIAN HMM: A COMPARISON FOR OFFLINE HANDWRITING RECOGNITION
    Dreuw, Philippe
    Doetsch, Patrick
    Plahl, Christian
    Ney, Hermann
    [J]. 2011 18TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2011,
  • [3] An HMM/MLP hybrid approach for improving discrimination in speech recognition
    Na, K
    Chae, SI
    [J]. IEEE WORLD CONGRESS ON COMPUTATIONAL INTELLIGENCE, 1998, : 156 - 159
  • [4] A study on recognition of speech based on HMM/MLP hybrid network
    Huang, XY
    Ma, XH
    Li, X
    Fu, YQ
    Lu, JR
    [J]. 2000 5TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING PROCEEDINGS, VOLS I-III, 2000, : 718 - 721
  • [5] Speech/speaker recognition using a HMM/GMM hybrid model
    Rodriguez, E
    Ruiz, B
    Garcia-Crespo, A
    Garcia, F
    [J]. AUDIO- AND VIDEO-BASED BIOMETRIC PERSON AUTHENTICATION, 1997, 1206 : 227 - 234
  • [6] Applying dynamic context into MLP/HMM speech recognition system
    Salmela, P
    [J]. COMPUTER SPEECH AND LANGUAGE, 2000, 15 (03): : 233 - 255
  • [7] HMM-GMM based Amazigh speech recognition system
    El Ouahabi, Safaa
    Atounti, Mohamed
    Bellouki, Mohamed
    [J]. INTERNATIONAL JOURNAL OF SIGNAL AND IMAGING SYSTEMS ENGINEERING, 2020, 12 (1-2) : 47 - 53
  • [8] REVISITING HYBRID AND GMM-HMM SYSTEM COMBINATION TECHNIQUES
    Swietojanski, Pawel
    Ghoshal, Arnab
    Renals, Steve
    [J]. 2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 6744 - 6748
  • [9] Discriminant Learning for Hybrid HMM/MLP Speech Recognition System using a Fuzzy Genetic Clustering
    Lazli, Lilia
    Laskri, Mohamed-Tayeb
    Boudour, Rachid
    [J]. PROCEEDINGS OF THE 2017 INTELLIGENT SYSTEMS CONFERENCE (INTELLISYS), 2017, : 76 - 81
  • [10] Hybrid continuous speech recognition systems by HMM, MLP and SVM: a comparative study
    Zarrouk, Elyes
    Ben Ayed, Yassine
    Gargouri, Faiez
    [J]. INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2014, 17 (03) : 223 - 233