Fusion Feature Extraction Based on Auditory and Energy for Noise-Robust Speech Recognition

被引:8
|
作者
Shi, Yanyan [1 ]
Bai, Jing [1 ]
Xue, Peiyun [1 ]
Shi, Dianxi [2 ,3 ]
机构
[1] Taiyuan Univ Technol, Coll Informat & Comp, Taiyuan 030024, Shanxi, Peoples R China
[2] NIIDT, AIRC, Beijing 100071, Peoples R China
[3] TAIIC, Tianjin 300457, Peoples R China
来源
IEEE ACCESS | 2019年 / 7卷
关键词
Cochlear filter cepstral coefficients; Teager energy operators cepstral coefficients; principal component analysis; speech recognition;
D O I
10.1109/ACCESS.2019.2918147
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Environmental noise can pose a threat to the stable operation of current speech recognition systems. It is therefore essential to develop a front feature set that is able to identify speech under low signal-to-noise ratio. In this paper, a robust fusion feature is proposed that can fully characterize speech information. To obtain the cochlear filter cepstral coefficients (CFCC), a novel feature is first extracted by the power-law nonlinear function, which can simulate the auditory characteristics of the human ear. Speech enhancement technology is then introduced into the front end of feature extraction, and the extracted feature and their first-order difference are combined in new mixed features. An energy feature Teager energy operator cepstral coefficient (TEOCC) is also extracted, and combined with the above-mentioned mixed features to form the fusion feature sets. Principal component analysis (PCA) is then applied to feature selection and optimization of the feature set, and the final feature set is used in a non-specific persons, isolated words, and small-vocabulary speech recognition system. Finally, a comparative experiment of speech recognition is designed to verify the advantages of the proposed feature set using a support vector machine (SVM). The experimental results show that the proposed feature set not only display a high recognition rate and excellent anti-noise performance in speech recognition, but can also fully characterize the auditory and energy information in the speech signals.
引用
收藏
页码:81911 / 81922
页数:12
相关论文
共 50 条
  • [41] Cluster-Based Pairwise Contrastive Loss for Noise-Robust Speech Recognition
    Lee, Geon Woo
    Kim, Hong Kook
    [J]. SENSORS, 2024, 24 (08)
  • [42] Unsupervised noise-robust feature extraction for aerial image classification
    LIANG Ye
    LU Shuai
    WENG Rui
    HAN ChengZhe
    LIU Ming
    [J]. Science China(Technological Sciences)., 2020, 63 (08) - 1415
  • [43] Knowledge Distillation-Based Training of Speech Enhancement for Noise-Robust Automatic Speech Recognition
    Woo Lee, Geon
    Kook Kim, Hong
    Kong, Duk-Jo
    [J]. IEEE ACCESS, 2024, 12 : 72707 - 72720
  • [44] MULTI-TASK AUTOENCODER FOR NOISE-ROBUST SPEECH RECOGNITION
    Zhang, Haoyi
    Liu, Conggui
    Inoue, Nakamasa
    Shinoda, Koichi
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5599 - 5603
  • [45] Empirical Mode Decomposition For Noise-Robust Automatic Speech Recognition
    Wu, Kuo-Hao
    Chen, Chia-Ping
    [J]. 11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4, 2010, : 2074 - 2077
  • [46] INCORPORATING MASK MODELLING FOR NOISE-ROBUST AUTOMATIC SPEECH RECOGNITION
    Koekueer, Muenevver
    Jancovic, Peter
    [J]. 2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS, 2009, : 3929 - 3932
  • [47] Deep Maxout Networks Applied to Noise-Robust Speech Recognition
    de-la-Calle-Silos, F.
    Gallardo-Antolin, A.
    Pelaez-Moreno, C.
    [J]. ADVANCES IN SPEECH AND LANGUAGE TECHNOLOGIES FOR IBERIAN LANGUAGES, IBERSPEECH 2014, 2014, 8854 : 109 - 118
  • [48] Unsupervised modulation filter learning for noise-robust speech recognition
    Agrawal, Purvi
    Ganapathy, Sriram
    [J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2017, 142 (03): : 1686 - 1692
  • [49] Unsupervised noise-robust feature extraction for aerial image classification
    LIANG Ye
    LU Shuai
    WENG Rui
    HAN ChengZhe
    LIU Ming
    [J]. Science China Technological Sciences, 2020, (08) : 1406 - 1415
  • [50] A companding front end for noise-robust automatic speech recognition
    Guinness, J
    Raj, B
    Schmidt-Nielsen, B
    Turicchia, L
    Sarpeshkar, R
    [J]. 2005 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1-5: SPEECH PROCESSING, 2005, : 249 - 252