A clustering based feature selection method in spectro-temporal domain for speech recognition

被引:14
|
作者
Esfandian, Nafiseh [1 ]
Razzazi, Farbod [1 ]
Behrad, Alireza [2 ]
机构
[1] Islamic Azad Univ, Sci & Res Branch, Dept Elect & Comp Engn, Tehran, Iran
[2] Shahed Univ, Fac Engn, Tehran, Iran
关键词
Speech recognition; Spectro-temporal model; Feature extraction; Clustering; Gaussian mixture models; Weighted K-means; WEIGHTED K-MEANS; REPRESENTATIONS;
D O I
10.1016/j.engappai.2012.04.004
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Spectro-temporal representation of speech has become one of the leading signal representation approaches in speech recognition systems in recent years. This representation suffers from high dimensionality of the features space which makes this domain unsuitable for practical speech recognition systems. In this paper, a new clustering based method is proposed for secondary feature selection/extraction in the spectro-temporal domain. In the proposed representation, Gaussian mixture models (GMM) and weighted K-means (WKM) clustering techniques are applied to spectro-temporal domain to reduce the dimensions of the features space. The elements of centroid vectors and covariance matrices of clusters are considered as attributes of the secondary feature vector of each frame. To evaluate the efficiency of the proposed approach, the tests were conducted for new feature vectors on classification of phonemes in main categories of phonemes in TIMIT database. It was shown that by employing the proposed secondary feature vector, a significant improvement was revealed in classification rate of different sets of phonemes comparing with MFCC features. The average achieved improvements in classification rates of voiced plosives comparing to MFCC features is 5.9% using WKM clustering and 6.4% using GMM clustering. The greatest improvement is about 7.4% which is obtained by using WKM clustering in classification of front vowels comparing to MFCC features. (C) 2012 Elsevier Ltd. All rights reserved.
引用
下载
收藏
页码:1194 / 1202
页数:9
相关论文
共 50 条
  • [1] A Feature Selection Method in Spectro-Temporal Domain Based on Gaussian Mixture Models
    Esfandian, Nafiseh
    Razzazi, Farbod
    Behrad, Alireza
    Valipour, Sara
    2010 IEEE 10TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING PROCEEDINGS (ICSP2010), VOLS I-III, 2010, : 522 - +
  • [2] DeepCNN: Spectro-temporal feature representation for speech emotion recognition
    Saleem, Nasir
    Gao, Jiechao
    Irfan, Rizwana
    Almadhor, Ahmad
    Rauf, Hafiz Tayyab
    Zhang, Yudong
    Kadry, Seifedine
    CAAI TRANSACTIONS ON INTELLIGENCE TECHNOLOGY, 2023, 8 (02) : 401 - 417
  • [3] A Paradigm for Limited Vocabulary Speech Recognition Based on Redundant Spectro-Temporal Feature Sets
    Chaudhuri, Sourish
    Raj, Bhiksha
    Ezzat, Tony
    12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 3176 - +
  • [4] Feature Adaptation Using Linear Spectro-Temporal Transform for Robust Speech Recognition
    Duc Hoang Ha Nguyen
    Xiao, Xiong
    Chng, Eng Siong
    Li, Haizhou
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2016, 24 (06) : 1006 - 1019
  • [5] Spectro-Temporal Modulations for Robust Speech Emotion Recognition
    Yeh, Lan-Ying
    Chi, Tai-Shih
    11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 1-2, 2010, : 789 - 792
  • [6] Hierarchical spectro-temporal features for robust speech recognition
    Domont, Xavier
    Heckmann, Martin
    Joublin, Frank
    Goerick, Christian
    2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12, 2008, : 4417 - 4420
  • [7] Hilbert Envelope Based Spectro-Temporal Features for Phoneme Recognition in Telephone Speech
    Thomas, Samuel
    Ganapathy, Sriram
    Hermansky, Hynek
    INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 1521 - +
  • [8] Data-Driven and Feedback Based Spectro-Temporal Features for Speech Recognition
    Sivaram, G. S. V. S.
    Nemala, Sridhar Krishna
    Mesgarani, Nima
    Hermansky, Hynek
    IEEE SIGNAL PROCESSING LETTERS, 2010, 17 (11) : 957 - 960
  • [9] Spectro-Temporal Deep Features for Disordered Speech Assessment and Recognition
    Geng, Mengzhe
    Liu, Shansong
    Yu, Jianwei
    Xie, Xurong
    Hu, Shoukang
    Ye, Zi
    Jin, Zengrui
    Liu, Xunying
    Meng, Helen
    INTERSPEECH 2021, 2021, : 4793 - 4797
  • [10] Methods for capturing spectro-temporal modulations in automatic speech recognition
    Kleinschmidt, M
    ACTA ACUSTICA UNITED WITH ACUSTICA, 2002, 88 (03) : 416 - 422