Filterbank Analysis of MFCC Feature Extraction in Robust Children Speech Recognition

被引:0
|
作者
Naing, Hay Mar Soe [1 ]
Miyanaga, Yoshikazu [2 ]
Hidayat, Risanuri [1 ]
Winduratna, Bondhan [1 ]
机构
[1] Gadjah Mada Univ, Dept Elect Engn & Informat Technol, Yogyakarta 55281, Indonesia
[2] Hokkaido Univ, GS Informat Sci & Techonol, GI CoRE GSB, Sapporo, Hokkaido 0600814, Japan
关键词
children speech recognition; gammatone frequency integration; MFCC; speaker adaptative model;
D O I
10.1109/ismac.2019.8836181
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
This paper focused on the issue of robustness in children speech recognition system. The shape of filterbank analysis is presented in this study to suppress the additive background noise in acoustic features of children speakers. In addition, the Linear Discriminant Analysis (LDA), the Maximum Likelihood Linear Transform (MLLT) and feature space Maximum Likelihood Linear Regression (fMLLR) features are applied to build the speaker adaptive acoustic model with the help of Kaldi speech recognition toolkit. The performance of Gammatone filterbank and Bark-scale filterbank based Cepstral features were evaluated under contaminated situations using five different types of noise at a range of signal to noise ratio (SNR) 10dB to -10dB. As the detailed analysis shown, the performance of Gammatone frequency integration is superior to Mel Frequency Cepstral Coefficient (MFCC) in different types of additive background noise and various SNR situations.
引用
收藏
页数:6
相关论文
共 50 条
  • [31] An auditory neural feature extraction method for robust speech recognition
    Guo, Wei
    Zhang, Liqing
    Xia, Bin
    2007 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL IV, PTS 1-3, 2007, : 793 - +
  • [32] A robust feature extraction for automatic speech recognition in noisy environments
    Lima, C
    Almeida, LB
    Monteiro, JL
    2002 6TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING PROCEEDINGS, VOLS I AND II, 2002, : 540 - 543
  • [33] Temporal modulation normalization for robust speech feature extraction and recognition
    Lu, Xugang
    Matsuda, Shigeki
    Unoki, Masashi
    Nakamura, Satoshi
    PROCEEDINGS OF THE 2009 2ND INTERNATIONAL CONGRESS ON IMAGE AND SIGNAL PROCESSING, VOLS 1-9, 2009, : 4354 - 4357
  • [34] Study of Robust Feature Extraction Techniques for Speech Recognition System
    Sharma, Usha
    Maheshkar, Sushila
    Mishra, A. N.
    2015 1ST INTERNATIONAL CONFERENCE ON FUTURISTIC TRENDS ON COMPUTATIONAL ANALYSIS AND KNOWLEDGE MANAGEMENT (ABLAZE), 2015, : 666 - 670
  • [35] Robust Feature Extraction for Speech Recognition by Enhancing Auditory Spectrum
    Alam, Md Jahangir
    Kenny, Patrick
    O'Shaughnessy, Douglas
    13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 1358 - 1361
  • [36] Speech feature extraction based on wavelet modulation scale for robust speech recognition
    Ma, Xin
    Zhou, Weidong
    Ju, Fang
    Jiang, Qi
    NEURAL INFORMATION PROCESSING, PT 2, PROCEEDINGS, 2006, 4233 : 499 - 505
  • [37] Spectral-Temporal Receptive Fields and MFCC Balanced Feature Extraction for Noisy Speech Recognition
    Wang, Jia-Ching
    Lin, Chang-Hong
    Chen, En-Ting
    Chang, Pao-Chi
    2014 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA), 2014,
  • [38] On the Effects of Filterbank Design and Energy Computation on Robust Speech Recognition
    Dimitriadis, Dimitrios
    Maragos, Petros
    Potamianos, Alexandros
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2011, 19 (06): : 1504 - 1516
  • [39] The Research of Feature Extraction Based on MFCC for Speaker Recognition
    Zhang Wanli
    Li Guoxin
    2013 3RD INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND NETWORK TECHNOLOGY (ICCSNT), 2013, : 1074 - 1077
  • [40] Selective Gammatone Filterbank Feature for Robust Sound Event Recognition
    Leng, Yi Ren
    Huy Dat Tran
    Kitaoka, Norihide
    Li, Haizhou
    11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4, 2010, : 2246 - +