A clustering based feature selection method in spectro-temporal domain for speech recognition

被引:14
|
作者
Esfandian, Nafiseh [1 ]
Razzazi, Farbod [1 ]
Behrad, Alireza [2 ]
机构
[1] Islamic Azad Univ, Sci & Res Branch, Dept Elect & Comp Engn, Tehran, Iran
[2] Shahed Univ, Fac Engn, Tehran, Iran
关键词
Speech recognition; Spectro-temporal model; Feature extraction; Clustering; Gaussian mixture models; Weighted K-means; WEIGHTED K-MEANS; REPRESENTATIONS;
D O I
10.1016/j.engappai.2012.04.004
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Spectro-temporal representation of speech has become one of the leading signal representation approaches in speech recognition systems in recent years. This representation suffers from high dimensionality of the features space which makes this domain unsuitable for practical speech recognition systems. In this paper, a new clustering based method is proposed for secondary feature selection/extraction in the spectro-temporal domain. In the proposed representation, Gaussian mixture models (GMM) and weighted K-means (WKM) clustering techniques are applied to spectro-temporal domain to reduce the dimensions of the features space. The elements of centroid vectors and covariance matrices of clusters are considered as attributes of the secondary feature vector of each frame. To evaluate the efficiency of the proposed approach, the tests were conducted for new feature vectors on classification of phonemes in main categories of phonemes in TIMIT database. It was shown that by employing the proposed secondary feature vector, a significant improvement was revealed in classification rate of different sets of phonemes comparing with MFCC features. The average achieved improvements in classification rates of voiced plosives comparing to MFCC features is 5.9% using WKM clustering and 6.4% using GMM clustering. The greatest improvement is about 7.4% which is obtained by using WKM clustering in classification of front vowels comparing to MFCC features. (C) 2012 Elsevier Ltd. All rights reserved.
引用
下载
收藏
页码:1194 / 1202
页数:9
相关论文
共 50 条
  • [31] SPECTRO-TEMPORAL GABOR FEATURES FOR SPEAKER RECOGNITION
    Lei, Howard
    Meyer, Bernd T.
    Mirghafori, Nikki
    2012 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2012, : 4241 - 4244
  • [32] A Clustering-based Approach for Features Extraction in Spectro-temporal Domain using Artificial Neural Network
    Esfandian, N.
    Hosseinpour, K.
    International Journal of Engineering, Transactions A: Basics, 2021, 34 (02): : 452 - 457
  • [33] A clustering-based approach for features extraction in spectro-temporal domain using artificial neural network
    Esfandian, N.
    Hosseinpour, K.
    International Journal of Engineering, Transactions B: Applications, 2021, 34 (02): : 452 - 457
  • [34] On the Suitability of the Riesz Spectro-Temporal Envelope for WaveNet Based Speech Synthesis
    Dhiman, Jitendra Kumar
    Adiga, Nagaraj
    Seelamantula, Chandra Sekhar
    INTERSPEECH 2019, 2019, : 944 - 948
  • [35] MODELLING SPECTRO-TEMPORAL DYNAMICS IN FACTORISATION-BASED NOISE-ROBUST AUTOMATIC SPEECH RECOGNITION
    Hurmalainen, Antti
    Virtanen, Tuomas
    2012 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2012, : 4113 - 4116
  • [36] Discrimination of speech from nonspeech based on multiscale spectro-temporal modulations
    Mesgarani, N
    Slaney, M
    Shamma, SA
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2006, 14 (03): : 920 - 930
  • [37] Learning spectro-temporal features with 3D CNNs for speech emotion recognition
    Kim, Jaebok
    Truong, Khiet P.
    Englebienne, Gwenn
    Evers, Vanessa
    2017 SEVENTH INTERNATIONAL CONFERENCE ON AFFECTIVE COMPUTING AND INTELLIGENT INTERACTION (ACII), 2017, : 383 - 388
  • [38] Spectro-Temporal Sparsity Characterization for Dysarthric Speech Detection
    Kodrasi, Ina
    Bourlard, Herve
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2020, 28 : 1210 - 1222
  • [39] Spectro-Temporal Representation of Speech for Intelligibility Assessment of Dysarthria
    Chandrashekar, H. M.
    Karjigi, Veena
    Sreedevi, N.
    IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2020, 14 (02) : 390 - 399
  • [40] Robustness of spectro-temporal features against intrinsic and extrinsic variations in automatic speech recognition
    Meyer, Bernd T.
    Kollmeier, Birger
    SPEECH COMMUNICATION, 2011, 53 (05) : 753 - 767