Spectral-temporal receptive fields and MFCC balanced feature extraction for robust speaker recognition

被引:1
|
作者
Jia-Ching Wang
Chien-Yao Wang
Yu-Hao Chin
Yu-Ting Liu
En-Ting Chen
Pao-Chi Chang
机构
[1] National Central University,Department of Computer Science and Information Engineering
[2] National Central University,Department of Communication Engineering
来源
关键词
STRF; Speaker recognition; Feature extraction; Speaker authentication;
D O I
暂无
中图分类号
学科分类号
摘要
This paper proposes a speaker recognition system using acoustic features that are based on spectral-temporal receptive fields (STRFs). The STRF is derived from physiological models of the mammalian auditory system in the spectral-temporal domain. With the STRF, a signal is expressed by rate (in Hz) and scale (in cycles/octaves). The rate and scale are used to specify the temporal response and spectral response, respectively. This paper uses the proposed STRF based feature to perform speaker recognition. First, the energy of each scale is calculated using the STRF representation. A logarithmic operation is then applied to the scale energies. Finally, a discrete cosine transform is utilized to the generation of the proposed STRF feature. This paper also presents a feature set that combines the proposed STRF feature with conventional Mel frequency cepstral coefficients (MFCCs). The support vector machines (SVMs) are adopted to be the speaker classifiers. To evaluate the performance of the proposed speaker recognition system, experiments on 36-speaker recognition were conducted. Comparing with the MFCC baseline, the proposed feature set increases the speaker recognition rates by 3.85 % and 18.49 % on clean and noisy speeches, respectively. The experiments results demonstrate the effectiveness of adopting STRF based feature in speaker recognition.
引用
收藏
页码:4055 / 4068
页数:13
相关论文
共 50 条
  • [21] Temporal modulation normalization for robust speech feature extraction and recognition
    Xugang Lu
    Shigeki Matsuda
    Masashi Unoki
    Satoshi Nakamura
    Multimedia Tools and Applications, 2011, 52 : 187 - 199
  • [22] Temporal modulation normalization for robust speech feature extraction and recognition
    Lu, Xugang
    Matsuda, Shigeki
    Unoki, Masashi
    Nakamura, Satoshi
    PROCEEDINGS OF THE 2009 2ND INTERNATIONAL CONGRESS ON IMAGE AND SIGNAL PROCESSING, VOLS 1-9, 2009, : 4354 - 4357
  • [23] Speaker Identification Using MFCC Feature Extraction ANN Classification Technique
    Singh, Mahesh K.
    WIRELESS PERSONAL COMMUNICATIONS, 2024, 136 (01) : 453 - 467
  • [24] Robust Feature Extraction for Speaker Recognition Based on Constrained Nonnegative Tensor Factorization
    Qiang Wu
    Li-Qing Zhang
    Guang-Chuan Shi
    Journal of Computer Science and Technology, 2010, 25 : 783 - 792
  • [25] Robust Feature Extraction for Speaker Recognition Based on Constrained Nonnegative Tensor Factorization
    吴强
    张丽清
    石光川
    Journal of Computer Science & Technology, 2010, 25 (04) : 783 - 792
  • [26] Robust Feature Extraction for Speaker Recognition Based on Constrained Nonnegative Tensor Factorization
    Wu, Qiang
    Zhang, Li-Qing
    Shi, Guang-Chuan
    JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY, 2010, 25 (04) : 783 - 792
  • [27] Robust feature extraction from spectrum estimated using bispectrum for speaker recognition
    Ajmera P.K.
    Nehe N.S.
    Jadhav D.V.
    Holambe R.S.
    International Journal of Speech Technology, 2012, 15 (3) : 433 - 440
  • [28] Robust analysis and weighting on MFCC components for speech recognition and speaker identification
    Zhou, Xi
    Fu, Yun
    Liu, Ming
    Hasegawa-Johnson, Mark
    Huang, Thomas S.
    2007 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, VOLS 1-5, 2007, : 188 - 191
  • [29] A Discriminative Spectral-Temporal Feature set for Motor Imagery Classification
    Abbas, Waseem
    Khan, Nadeem Ahmad
    2017 IEEE INTERNATIONAL WORKSHOP ON SIGNAL PROCESSING SYSTEMS (SIPS), 2017,
  • [30] Application of Slope Filtering to Robust Spectral Envelope Extraction for Speech/Speaker Recognition
    Drgas, Szymon
    Dabrowski, Adam
    HUMAN LANGUAGE TECHNOLOGY: CHALLENGES OF THE INFORMATION SOCIETY, 2009, 5603 : 13 - 23