Acoustic Feature Optimization Based on F-Ratio for Robust Speech Recognition

被引:5
|
作者
Sun, Yanqing [1 ]
Zhou, Yu [1 ]
Zhao, Qingwei [1 ]
Yan, Yonghong [1 ]
机构
[1] Chinese Acad Sci, Inst Acoust, ThinkIT Speech Lab, Beijing 100864, Peoples R China
来源
基金
国家高技术研究发展计划(863计划); 中国国家自然科学基金;
关键词
mismatched speech; robust speech recognition; F-Ratio; subband design; feature optimization;
D O I
10.1587/transinf.E93.D.2417
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper focuses on the problem of performance degradation in mismatched speech recognition. The F-Ratio analysis method is utilized to analyze the significance of different frequency bands for speech unit classification, and we find that frequencies around 1 kHz and 3 kHz, which are the upper bounds of the first and the second formants for most of the vowels, should be emphasized in comparison to the Mel-frequency cepstral coefficients (MFCC). The analysis result is further observed to be stable in several typical mismatched situations. Similar to the Mel-Frequency scale, another frequency scale called the F-Ratio-scale is thus proposed to optimize the filter bank design for the MFCC features, and make each subband contains equal significance for speech unit classification. Under comparable conditions, with the modified features we get a relative 43.20% decrease compared with the MFCC in sentence error rate for the emotion affected speech recognition, 35.54%, 23.03% for the noisy speech recognition at 15 dB and 0 dB SNR (signal to noise ratio) respectively, and 64.50% for the three years' 863 test data. The application of the F-Ratio analysis on the clean training set of the Aurora2 database demonstrates its robustness over languages, texts and sampling rates.
引用
收藏
页码:2417 / 2430
页数:14
相关论文
共 50 条
  • [31] Double Gaussian based feature normalization for robust speech recognition
    Liu, B
    Dai, LR
    Li, JY
    Wang, RH
    2004 INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, 2004, : 253 - 256
  • [32] Feature extraction based on auditory representations for robust speech recognition
    Kim, DS
    Lee, SY
    Kil, RM
    Zhu, XL
    ELECTRONICS LETTERS, 1997, 33 (01) : 15 - 16
  • [33] A Multichannel Feature-Based Processing for Robust Speech Recognition
    Souden, Mehrez
    Kinoshita, Keisuke
    Delcroix, Marc
    Nakatani, Tomohiro
    12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 696 - 699
  • [34] Speech feature extraction based on wavelet modulation scale for robust speech recognition
    Ma, Xin
    Zhou, Weidong
    Ju, Fang
    Jiang, Qi
    NEURAL INFORMATION PROCESSING, PT 2, PROCEEDINGS, 2006, 4233 : 499 - 505
  • [35] Missing-Feature-Theory-based Robust Simultaneous Speech Recognition System with Non-clean Speech Acoustic Model
    Takahashi, Toni
    Nakadai, Kazuhiro
    Komatani, Kazunori
    Ogata, Tetsuya
    Okuno, Hiroshi G.
    2009 IEEE-RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS, 2009, : 2730 - 2735
  • [36] On the Jointly Unsupervised Feature Vector Normalization and Acoustic Model Compensation for Robust Speech Recognition
    Buera, Luis
    Miguel, Antonio
    Lleida, Eduardo
    Saz, Oscar
    Ortega, Alfonso
    INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 1381 - 1384
  • [37] Feature-based Noise Robust Speech Recognition on an Indonesian Language Automatic Speech Recognition System
    Satriawan, Cil Hardianto
    Lestari, Dessi Puji
    2014 International Conference on Electrical Engineering and Computer Science (ICEECS), 2014, : 42 - 46
  • [38] Feature extraction for speech recognition based on orthogonal acoustic-feature planes and LDA
    Nitta, T
    ICASSP '99: 1999 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, PROCEEDINGS VOLS I-VI, 1999, : 421 - 424
  • [39] Sequential MAP estimation based speech feature enhancement for noise robust speech recognition
    Jia, C
    Ding, P
    Xu, B
    2003 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS: SPEECH PROCESSING I, 2003, : 412 - 415
  • [40] EFFECT OF FEATURE SMOOTHING FOR ROBUST SPEECH RECOGNITION
    Xiao, Xiong
    Chng, Eng Siong
    Lit, Haizhou
    2008 6TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, 2008, : 73 - 76