Acoustic Feature Optimization Based on F-Ratio for Robust Speech Recognition

被引：5

作者：

Sun, Yanqing ^{[1
]}

Zhou, Yu ^{[1
]}

Zhao, Qingwei ^{[1
]}

Yan, Yonghong ^{[1
]}

机构：

[1] Chinese Acad Sci, Inst Acoust, ThinkIT Speech Lab, Beijing 100864, Peoples R China

来源：

IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS | 2010年 / E93D卷 / 09期

基金：

国家高技术研究发展计划(863计划); 中国国家自然科学基金;

关键词：

mismatched speech; robust speech recognition; F-Ratio; subband design; feature optimization;

D O I：

10.1587/transinf.E93.D.2417

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

This paper focuses on the problem of performance degradation in mismatched speech recognition. The F-Ratio analysis method is utilized to analyze the significance of different frequency bands for speech unit classification, and we find that frequencies around 1 kHz and 3 kHz, which are the upper bounds of the first and the second formants for most of the vowels, should be emphasized in comparison to the Mel-frequency cepstral coefficients (MFCC). The analysis result is further observed to be stable in several typical mismatched situations. Similar to the Mel-Frequency scale, another frequency scale called the F-Ratio-scale is thus proposed to optimize the filter bank design for the MFCC features, and make each subband contains equal significance for speech unit classification. Under comparable conditions, with the modified features we get a relative 43.20% decrease compared with the MFCC in sentence error rate for the emotion affected speech recognition, 35.54%, 23.03% for the noisy speech recognition at 15 dB and 0 dB SNR (signal to noise ratio) respectively, and 64.50% for the three years' 863 test data. The application of the F-Ratio analysis on the clean training set of the Aurora2 database demonstrates its robustness over languages, texts and sampling rates.

引用

页码：2417 / 2430

页数：14

共 50 条

[21] A robust feature selection method based on meta-heuristic optimization for speech emotion recognition
Bagadi, Kesava Rao
Sivappagari, Chandra Mohan Reddy
EVOLUTIONARY INTELLIGENCE, 2024, 17 (02) : 993 - 1004
[22] A Novel Acoustic Feature Extraction Algorithm Based on Root Cepstrum Coefficients and CCBC for Robust Speech Recognition
Wang, Xu
Han, Zhiyan
2008 INTERNATIONAL SYMPOSIUM ON INTELLIGENT INFORMATION TECHNOLOGY APPLICATION, VOL I, PROCEEDINGS, 2008, : 643 - 647
[23] Feature extraction for robust speech recognition
Dharanipragada, S
2002 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, VOL II, PROCEEDINGS, 2002, : 855 - 858
[24] A New Feature Extraction Method for Bone-conducted Life Sounds based on F-ratio
An, Yeteng
Wang, Hongcui
Hyon, Songgun
Chen, Sai
Dang, Jianwu
INFORMATION TECHNOLOGY APPLICATIONS IN INDUSTRY II, PTS 1-4, 2013, 411-414 : 1598 - 1604
[25] Novel robust feature extraction based on spectrally masked channel energy ratio (SMaChER) for speech recognition
Ma, CX
2003 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL II, PROCEEDINGS: SPEECH II; INDUSTRY TECHNOLOGY TRACKS; DESIGN & IMPLEMENTATION OF SIGNAL PROCESSING SYSTEMS; NEURAL NETWORKS FOR SIGNAL PROCESSING, 2003, : 37 - 40
[26] Speech Emotion Recognition Based on Multi Acoustic Feature Fusion
Xiang, Shanshan
Anwer, Sadiyagul
Yilahun, Hankiz
Hamdulla, Askar
MAN-MACHINE SPEECH COMMUNICATION, NCMMSC 2024, 2025, 2312 : 338 - 346
[27] Speech Recognition Based on Concatenated Acoustic Feature and LightGBM Model
Yu, Jiali
Qu, Yuanyuan
Zhang, Zhongkai
Lu, Qidong
Qin, Zhiliang
Liu, Xiaowei
TWELFTH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING SYSTEMS, 2021, 11719
[28] Model-based feature compensation for robust speech recognition
Shen, Haifeng
Li, Qunxia
Guo, Jun
Liu, Gang
FUNDAMENTA INFORMATICAE, 2006, 72 (04) : 529 - 539
[29] Model-based feature compensation for robust speech recognition
School of Information Engineering, Beijing University of Posts and Telecommunications, Beijing, 100876, China
不详
不详
Fundam Inf, 2006, 4 (529-539):
[30] A robust speech recognition based on the feature of weighting combination ZCPA
Zhang, Xueying
Liang, Wuzhou
ICICIC 2006: FIRST INTERNATIONAL CONFERENCE ON INNOVATIVE COMPUTING, INFORMATION AND CONTROL, VOL 3, PROCEEDINGS, 2006, : 361 - +

← 1 2 3 4 5 →