A clustering based feature selection method in spectro-temporal domain for speech recognition

被引:14
|
作者
Esfandian, Nafiseh [1 ]
Razzazi, Farbod [1 ]
Behrad, Alireza [2 ]
机构
[1] Islamic Azad Univ, Sci & Res Branch, Dept Elect & Comp Engn, Tehran, Iran
[2] Shahed Univ, Fac Engn, Tehran, Iran
关键词
Speech recognition; Spectro-temporal model; Feature extraction; Clustering; Gaussian mixture models; Weighted K-means; WEIGHTED K-MEANS; REPRESENTATIONS;
D O I
10.1016/j.engappai.2012.04.004
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Spectro-temporal representation of speech has become one of the leading signal representation approaches in speech recognition systems in recent years. This representation suffers from high dimensionality of the features space which makes this domain unsuitable for practical speech recognition systems. In this paper, a new clustering based method is proposed for secondary feature selection/extraction in the spectro-temporal domain. In the proposed representation, Gaussian mixture models (GMM) and weighted K-means (WKM) clustering techniques are applied to spectro-temporal domain to reduce the dimensions of the features space. The elements of centroid vectors and covariance matrices of clusters are considered as attributes of the secondary feature vector of each frame. To evaluate the efficiency of the proposed approach, the tests were conducted for new feature vectors on classification of phonemes in main categories of phonemes in TIMIT database. It was shown that by employing the proposed secondary feature vector, a significant improvement was revealed in classification rate of different sets of phonemes comparing with MFCC features. The average achieved improvements in classification rates of voiced plosives comparing to MFCC features is 5.9% using WKM clustering and 6.4% using GMM clustering. The greatest improvement is about 7.4% which is obtained by using WKM clustering in classification of front vowels comparing to MFCC features. (C) 2012 Elsevier Ltd. All rights reserved.
引用
下载
收藏
页码:1194 / 1202
页数:9
相关论文
共 50 条
  • [21] POINT PROCESS MODELS OF SPECTRO-TEMPORAL MODULATION EVENTS FOR SPEECH RECOGNITION
    Jansen, Aren
    Mesgarani, Nima
    Niyogi, Partha
    2010 CONFERENCE RECORD OF THE FORTY FOURTH ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS AND COMPUTERS (ASILOMAR), 2010, : 104 - 108
  • [22] A Novel Spectro-Temporal Feature Extraction Method for Phoneme Classification
    Fartash, Mehdi
    Setayeshi, Saeed
    Razzazi, Farbod
    2010 IEEE 10TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING PROCEEDINGS (ICSP2010), VOLS I-III, 2010, : 569 - +
  • [23] Development of spectro-temporal features of speech in children
    Gautam S.
    Singh L.
    Gautam, Sumanlata (suman.gautam82@gmail.com), 1600, Springer Science and Business Media, LLC (20): : 543 - 551
  • [24] SPECTRO-TEMPORAL NEURAL FACTORIZATION FOR SPEECH DEREVERBERATION
    Chien, Jen-Tzung
    Kuo, Kuan-Ting
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5449 - 5453
  • [25] Nonlinear spectro-temporal features based on a cochlear model for automatic speech recognition in a noisy situation
    Choi, Yong-Sun
    Lee, Soo-Young
    NEURAL NETWORKS, 2013, 45 : 62 - 69
  • [26] A hierarchical framework for spectro-temporal feature extraction
    Heckmann, Martin
    Domont, Xavier
    Joublin, Frank
    Goerick, Christian
    SPEECH COMMUNICATION, 2011, 53 (05) : 736 - 752
  • [27] Localized spectro-temporal cepstral analysis of speech
    Bouvrie, Jake
    Ezzat, Tony
    Poggio, Tomaso
    2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12, 2008, : 4733 - 4736
  • [28] Informative Spectro-Temporal Bottleneck Features for Noise-Robust Speech Recognition
    Chang, Shuo-Yiin
    Morgan, Nelson
    14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 99 - 103
  • [29] Speaker sex effects on temporal and spectro-temporal measures of speech
    Herrmann, Frank
    Cunningham, Stuart P.
    Whiteside, Sandra P.
    JOURNAL OF THE INTERNATIONAL PHONETIC ASSOCIATION, 2014, 44 (01) : 59 - 74
  • [30] A Clustering-based Approach for Features Extraction in Spectro-temporal Domain using Artificial Neural Network
    Esfandian, N.
    Hosseinpour, K.
    INTERNATIONAL JOURNAL OF ENGINEERING, 2021, 34 (02): : 452 - 457