Fusion Feature Extraction Based on Auditory and Energy for Noise-Robust Speech Recognition

被引：8

作者：

Shi, Yanyan ^{[1
]}

Bai, Jing ^{[1
]}

Xue, Peiyun ^{[1
]}

Shi, Dianxi ^{[2
,3
]}

机构：

[1] Taiyuan Univ Technol, Coll Informat & Comp, Taiyuan 030024, Shanxi, Peoples R China

[2] NIIDT, AIRC, Beijing 100071, Peoples R China

[3] TAIIC, Tianjin 300457, Peoples R China

来源：

IEEE ACCESS | 2019年 / 7卷

关键词：

Cochlear filter cepstral coefficients; Teager energy operators cepstral coefficients; principal component analysis; speech recognition;

D O I：

10.1109/ACCESS.2019.2918147

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Environmental noise can pose a threat to the stable operation of current speech recognition systems. It is therefore essential to develop a front feature set that is able to identify speech under low signal-to-noise ratio. In this paper, a robust fusion feature is proposed that can fully characterize speech information. To obtain the cochlear filter cepstral coefficients (CFCC), a novel feature is first extracted by the power-law nonlinear function, which can simulate the auditory characteristics of the human ear. Speech enhancement technology is then introduced into the front end of feature extraction, and the extracted feature and their first-order difference are combined in new mixed features. An energy feature Teager energy operator cepstral coefficient (TEOCC) is also extracted, and combined with the above-mentioned mixed features to form the fusion feature sets. Principal component analysis (PCA) is then applied to feature selection and optimization of the feature set, and the final feature set is used in a non-specific persons, isolated words, and small-vocabulary speech recognition system. Finally, a comparative experiment of speech recognition is designed to verify the advantages of the proposed feature set using a support vector machine (SVM). The experimental results show that the proposed feature set not only display a high recognition rate and excellent anti-noise performance in speech recognition, but can also fully characterize the auditory and energy information in the speech signals.

引用

页码：81911 / 81922

页数：12

共 50 条

[11] Sparse Auditory Reproducing Kernel (SPARK) Features for Noise-Robust Speech Recognition
Fazel, Amin
Chakrabartty, Shantanu
[J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2012, 20 (04): : 1362 - 1371
[12] Employing Robust Principal Component Analysis for Noise-Robust Speech Feature Extraction in Automatic Speech Recognition with the Structure of a Deep Neural Network
Hung, Jeih-weih
Lin, Jung-Shan
Wu, Po-Jen
[J]. APPLIED SYSTEM INNOVATION, 2018, 1 (03) : 1 - 14
[13] Robust Feature Extraction for Speech Recognition by Enhancing Auditory Spectrum
Alam, Md Jahangir
Kenny, Patrick
O'Shaughnessy, Douglas
[J]. 13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 1358 - 1361
[14] An auditory neural feature extraction method for robust speech recognition
Guo, Wei
Zhang, Liqing
Xia, Bin
[J]. 2007 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL IV, PTS 1-3, 2007, : 793 - +
[15] Noise-Robust speech recognition of Conversational Telephone Speech
Chen, Gang
Tolba, Hesham
O'Shaughnessy, Douglas
[J]. INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 1101 - 1104
[16] Noise-robust speech recognition based on difference of power spectrum
Xu, JF
Wei, G
[J]. ELECTRONICS LETTERS, 2000, 36 (14) : 1247 - 1248
[17] An overview of noise-robust automatic speech recognition
Li, Jinyu
Deng, Li
Gong, Yifan
Haeb-Umbach, Reinhold
[J]. IEEE Transactions on Audio, Speech and Language Processing, 2014, 22 (04): : 745 - 777
[18] Noise-Robust Speech Recognition Based on RBF Neural Network
Hou, Xuemei
[J]. HIGH PERFORMANCE STRUCTURES AND MATERIALS ENGINEERING, PTS 1 AND 2, 2011, 217-218 : 413 - 418
[19] An Overview of Noise-Robust Automatic Speech Recognition
Li, Jinyu
Deng, Li
Gong, Yifan
Haeb-Umbach, Reinhold
[J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2014, 22 (04) : 745 - 777
[20] Covariance Modelling for Noise-Robust Speech Recognition
van Dalen, R. C.
Gales, M. J. F.
[J]. INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 2000 - 2003

← 1 2 3 4 5 →