Comparison and combination of features in a hybrid HMM/MLP and a HMM/GMM speech recognition system

被引：25

作者：

Pujol, P

Pol, S

Nadeu, C

Hagen, A

Bourlard, H

机构：

[1] Univ Politecn Catalunya, Talp Res Ctr, ES-08034 Barcelona, Spain

[2] INESC ID, Spoken Language Syst Lab L2F, P-1000029 Lisbon, Portugal

[3] IDIAP, CH-1920 Martigny, Switzerland

来源：

IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING | 2005年 / 13卷 / 01期

关键词：

frequency filtering (FF); multistream; MLP; product rule; relative spectra (Rasta); robustness;

D O I：

10.1109/TSA.2004.834466

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Recently, the advantages of the spectral parameters obtained by frequency filtering (FF) of the logarithmic filter-bank energies (logFBEs) have been reported. These parameters, which are frequency derivatives of the logFBEs, lie in the frequency domain, and have shown good recognition performance with respect to the conventional mel-frequency cepstral coefficients (MFCCs) for hidden Markov models (HMM) based systems. In this paper, the FF features are first compared with the MFCCs and the relative spectral perceptual linear prediction (Rasta-PLP) features using both a hybrid HMM/MLP and a usual HMM/Gaussian mixture models (HMM/GMM) based recognition system, for both clean and noisy speech. Taking advantage of the ability of the hybrid system to deal with correlated features, the inclusion of both the frequency second-derivatives and the raw logFBEs as additional features is proposed and tested. Moreover, the robustness of these features in noisy conditions is enhanced by combining the FF technique with the Rasta temporal filtering approach. Finally, a study of the FF features in the framework of multistream processing is presented. The best recognition results for both clean and noisy speech are obtained from the multistream combination-of the J-Rasta-PLP features and the FF features.

引用

页码：14 / 22

页数：9

共 50 条

[21] HMM/MLP hybrid speech recognizer for the Portuguese telephone SpeechDat corpus
Hagen, A
Neto, JP
[J]. COMPUTATIONAL PROCESSING OF THE PORTUGUESE LANGUAGE, PROCEEDINGS, 2003, 2721 : 126 - 134
[22] Comparison of acoustical models of GMM-HMM based for speech recognition in Hindi using PocketSphinx
Manasa, Chadalavada Sai
Priya, K. Jeeva
Gupta, Deepa
[J]. PROCEEDINGS OF THE 2019 3RD INTERNATIONAL CONFERENCE ON COMPUTING METHODOLOGIES AND COMMUNICATION (ICCMC 2019), 2019, : 534 - 539
[23] Peripheral features for HMM-based speech recognition
Fukuda, T
Takigawa, M
Nitta, T
[J]. 2001 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-VI, PROCEEDINGS: VOL I: SPEECH PROCESSING 1; VOL II: SPEECH PROCESSING 2 IND TECHNOL TRACK DESIGN & IMPLEMENTATION OF SIGNAL PROCESSING SYSTEMS NEURALNETWORKS FOR SIGNAL PROCESSING; VOL III: IMAGE & MULTIDIMENSIONAL SIGNAL PROCESSING MULTIMEDIA SIGNAL PROCESSING - VOL IV: SIGNAL PROCESSING FOR COMMUNICATIONS; VOL V: SIGNAL PROCESSING EDUCATION SENSOR ARRAY & MULTICHANNEL SIGNAL PROCESSING AUDIO & ELECTROACOUSTICS; VOL VI: SIGNAL PROCESSING THEORY & METHODS STUDENT FORUM, 2001, : 129 - 132
[24] A GMM/HMM model for reconstruction of missing speech spectral components for continuous speech recognition
Goodarzi M.M.
Almasganj F.
[J]. International Journal of Speech Technology, 2016, 19 (4) : 769 - 777
[25] An Innovative Method for Speech Signal Emotion Recognition Based on Spectral Features Using GMM and HMM Techniques
Mohammed Jawad Al-Dujaili Al-Khazraji
Abbas Ebrahimi-Moghadam
[J]. Wireless Personal Communications, 2024, 134 : 735 - 753
[26] An Innovative Method for Speech Signal Emotion Recognition Based on Spectral Features Using GMM and HMM Techniques
Al-Khazraji, Mohammed Jawad Al-Dujaili
Ebrahimi-Moghadam, Abbas
[J]. WIRELESS PERSONAL COMMUNICATIONS, 2024, 134 (02) : 735 - 753
[27] A Study on HMM based Speech Recognition System
Boruah, Saptarshi
Basishtha, Subhash
[J]. 2013 IEEE INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND COMPUTING RESEARCH (ICCIC), 2013, : 153 - 157
[28] The realization of speech recognition system based on HMM
Yiao, Mingming
[J]. 2008 PROCEEDINGS OF INFORMATION TECHNOLOGY AND ENVIRONMENTAL SYSTEM SCIENCES: ITESS 2008, VOL 4, 2008, : 24 - 29
[29] AN INVESTIGATION ON DNN-DERIVED BOTTLENECK FEATURES FOR GMM-HMM BASED ROBUST SPEECH RECOGNITION
You, Yongbin
Qian, Yanmin
He, Tianxing
Yu, Kai
[J]. 2015 IEEE CHINA SUMMIT & INTERNATIONAL CONFERENCE ON SIGNAL AND INFORMATION PROCESSING, 2015, : 30 - 34
[30] Musical beat recognition using a MLP-HMM hybrid classifier
Castro, PAC
Dexter, I
Garcia, S
Cajote, RD
[J]. TENCON 2004 - 2004 IEEE REGION 10 CONFERENCE, VOLS A-D, PROCEEDINGS: ANALOG AND DIGITAL TECHNIQUES IN ELECTRICAL ENGINEERING, 2004, : A104 - A107

← 1 2 3 4 5 →