Acoustic Feature Optimization Based on F-Ratio for Robust Speech Recognition

被引:5
|
作者
Sun, Yanqing [1 ]
Zhou, Yu [1 ]
Zhao, Qingwei [1 ]
Yan, Yonghong [1 ]
机构
[1] Chinese Acad Sci, Inst Acoust, ThinkIT Speech Lab, Beijing 100864, Peoples R China
来源
基金
国家高技术研究发展计划(863计划); 中国国家自然科学基金;
关键词
mismatched speech; robust speech recognition; F-Ratio; subband design; feature optimization;
D O I
10.1587/transinf.E93.D.2417
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper focuses on the problem of performance degradation in mismatched speech recognition. The F-Ratio analysis method is utilized to analyze the significance of different frequency bands for speech unit classification, and we find that frequencies around 1 kHz and 3 kHz, which are the upper bounds of the first and the second formants for most of the vowels, should be emphasized in comparison to the Mel-frequency cepstral coefficients (MFCC). The analysis result is further observed to be stable in several typical mismatched situations. Similar to the Mel-Frequency scale, another frequency scale called the F-Ratio-scale is thus proposed to optimize the filter bank design for the MFCC features, and make each subband contains equal significance for speech unit classification. Under comparable conditions, with the modified features we get a relative 43.20% decrease compared with the MFCC in sentence error rate for the emotion affected speech recognition, 35.54%, 23.03% for the noisy speech recognition at 15 dB and 0 dB SNR (signal to noise ratio) respectively, and 64.50% for the three years' 863 test data. The application of the F-Ratio analysis on the clean training set of the Aurora2 database demonstrates its robustness over languages, texts and sampling rates.
引用
收藏
页码:2417 / 2430
页数:14
相关论文
共 50 条
  • [41] ROBUST FEATURE EXTRACTORS FOR CONTINUOUS SPEECH RECOGNITION
    Alam, M. J.
    Kenny, P.
    Dumouchel, P.
    O'Shaughnessy, D.
    2014 PROCEEDINGS OF THE 22ND EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2014, : 944 - 948
  • [42] Geometrical feature extraction for robust speech recognition
    Li, Xiaokun
    Kwan, Chiman
    2005 39TH ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS AND COMPUTERS, VOLS 1 AND 2, 2005, : 558 - 562
  • [43] Feature Adaptation for Robust Mobile Speech Recognition
    Lee, Hyeopwoo
    Yook, Dongsuk
    IEEE TRANSACTIONS ON CONSUMER ELECTRONICS, 2012, 58 (04) : 1393 - 1398
  • [44] Robust speech recognition method based on discriminative environment feature extraction
    Han, JQ
    Gao, W
    JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY, 2001, 16 (05) : 458 - 464
  • [45] Wavelet-based denoising for robust feature extraction for speech recognition
    Farooq, O
    Datta, S
    ELECTRONICS LETTERS, 2003, 39 (01) : 163 - 165
  • [46] Robust endpoint detection for speech recognition based on discriminative feature extraction
    Yamamoto, Koichi
    Jabloun, Firas
    Reinhard, Klaus
    Kawamura, Akinori
    2006 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-13, 2006, : 805 - 808
  • [47] Feature compensation based on independent noise estimation for robust speech recognition
    Lu, Yong
    Lin, Han
    Wu, Pingping
    Chen, Yitao
    EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2021, 2021 (01)
  • [48] Robust Feature Extraction for Speech Recognition Based on Perceptually Motivated MUSIC
    Han Zhi-yan
    Wang Jian
    PROCEEDINGS 2010 3RD IEEE INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND INFORMATION TECHNOLOGY, (ICCSIT 2010), VOL 1, 2010, : 98 - 102
  • [49] Robust Speech Recognition Method Based on Discriminative Environment Feature Extraction
    韩纪庆
    高文
    Journal of Computer Science and Technology, 2001, (05) : 458 - 464
  • [50] Auditory-model based robust feature selection for speech recognition
    Koniaris, Christos
    Kuropatwinski, Marcin
    Kleijn, W. Bastiaan
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2010, 127 (02): : EL73 - EL79