Acoustic Feature Optimization Based on F-Ratio for Robust Speech Recognition

被引:5
|
作者
Sun, Yanqing [1 ]
Zhou, Yu [1 ]
Zhao, Qingwei [1 ]
Yan, Yonghong [1 ]
机构
[1] Chinese Acad Sci, Inst Acoust, ThinkIT Speech Lab, Beijing 100864, Peoples R China
来源
基金
国家高技术研究发展计划(863计划); 中国国家自然科学基金;
关键词
mismatched speech; robust speech recognition; F-Ratio; subband design; feature optimization;
D O I
10.1587/transinf.E93.D.2417
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper focuses on the problem of performance degradation in mismatched speech recognition. The F-Ratio analysis method is utilized to analyze the significance of different frequency bands for speech unit classification, and we find that frequencies around 1 kHz and 3 kHz, which are the upper bounds of the first and the second formants for most of the vowels, should be emphasized in comparison to the Mel-frequency cepstral coefficients (MFCC). The analysis result is further observed to be stable in several typical mismatched situations. Similar to the Mel-Frequency scale, another frequency scale called the F-Ratio-scale is thus proposed to optimize the filter bank design for the MFCC features, and make each subband contains equal significance for speech unit classification. Under comparable conditions, with the modified features we get a relative 43.20% decrease compared with the MFCC in sentence error rate for the emotion affected speech recognition, 35.54%, 23.03% for the noisy speech recognition at 15 dB and 0 dB SNR (signal to noise ratio) respectively, and 64.50% for the three years' 863 test data. The application of the F-Ratio analysis on the clean training set of the Aurora2 database demonstrates its robustness over languages, texts and sampling rates.
引用
收藏
页码:2417 / 2430
页数:14
相关论文
共 50 条
  • [1] Optimization of TESPAR features using robust F-ratio for speaker recognition
    Prasad, K. Satya
    Sheela, K. Anitha
    Sridevi, M.
    2007 INTERNATIONAL CONFERENCE OF SIGNAL PROCESSING, COMMUNICATIONS AND NETWORKING, VOLS 1 AND 2, 2006, : 20 - +
  • [2] An F-ratio based optimization technique for automatic speaker recognition system
    Saha, G
    Chakroborty, S
    Senapati, S
    Proceedings of the IEEE INDICON 2004, 2004, : 70 - 73
  • [3] An F-ratio based optimization on noisy data for speaker recognition application
    Saha, G
    Senapati, S
    Chakroborty, S
    INDICON 2005 PROCEEDINGS, 2005, : 352 - 355
  • [4] Hand gesture recognition using DWT and F-ratio based feature descriptor
    Sahoo, Jaya Prakash
    Ari, Samit
    Ghosh, Dipak Kumar
    IET IMAGE PROCESSING, 2018, 12 (10) : 1780 - 1787
  • [5] Acoustic feature combination for robust speech recognition
    Zolnay, A
    Schlüter, R
    Ney, H
    2005 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1-5: SPEECH PROCESSING, 2005, : 457 - 460
  • [6] Combining acoustic and articulatory feature information for robust speech recognition
    Kirchhoff, K
    Fink, GA
    Sagerer, G
    SPEECH COMMUNICATION, 2002, 37 (3-4) : 303 - 319
  • [7] Application of Genetic Algorithm Based on F-Ratio Rule in Signal Feature Selection
    An, Ting
    2017 10TH INTERNATIONAL SYMPOSIUM ON COMPUTATIONAL INTELLIGENCE AND DESIGN (ISCID), VOL 2, 2017, : 492 - 495
  • [8] Linear discriminant analysis F-ratio for optimization of TESPAR & MFCC features for speaker recognition
    DSP Group, Jawaharlal Nehru Technological University, Hyderabad, India
    J. Multimedia, 2007, 6 (34-43):
  • [9] Acoustic feature analysis and optimization for Bangla speech emotion recognition
    Sultana, Sadia
    Rahman, Mohammad Shahidur
    ACOUSTICAL SCIENCE AND TECHNOLOGY, 2023, 44 (03) : 157 - 166
  • [10] Joint model and feature space optimization for robust speech recognition
    Hwang, JN
    Wang, CJ
    1997 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I - V: VOL I: PLENARY, EXPERT SUMMARIES, SPECIAL, AUDIO, UNDERWATER ACOUSTICS, VLSI; VOL II: SPEECH PROCESSING; VOL III: SPEECH PROCESSING, DIGITAL SIGNAL PROCESSING; VOL IV: MULTIDIMENSIONAL SIGNAL PROCESSING, NEURAL NETWORKS - VOL V: STATISTICAL SIGNAL AND ARRAY PROCESSING, APPLICATIONS, 1997, : 855 - 858