Spectral-temporal receptive fields and MFCC balanced feature extraction for robust speaker recognition

被引：1

作者：

Jia-Ching Wang

Chien-Yao Wang

Yu-Hao Chin

Yu-Ting Liu

En-Ting Chen

Pao-Chi Chang

机构：

[1] National Central University,Department of Computer Science and Information Engineering

[2] National Central University,Department of Communication Engineering

来源：

Multimedia Tools and Applications | 2017年 / 76卷

关键词：

STRF; Speaker recognition; Feature extraction; Speaker authentication;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

This paper proposes a speaker recognition system using acoustic features that are based on spectral-temporal receptive fields (STRFs). The STRF is derived from physiological models of the mammalian auditory system in the spectral-temporal domain. With the STRF, a signal is expressed by rate (in Hz) and scale (in cycles/octaves). The rate and scale are used to specify the temporal response and spectral response, respectively. This paper uses the proposed STRF based feature to perform speaker recognition. First, the energy of each scale is calculated using the STRF representation. A logarithmic operation is then applied to the scale energies. Finally, a discrete cosine transform is utilized to the generation of the proposed STRF feature. This paper also presents a feature set that combines the proposed STRF feature with conventional Mel frequency cepstral coefficients (MFCCs). The support vector machines (SVMs) are adopted to be the speaker classifiers. To evaluate the performance of the proposed speaker recognition system, experiments on 36-speaker recognition were conducted. Comparing with the MFCC baseline, the proposed feature set increases the speaker recognition rates by 3.85 % and 18.49 % on clean and noisy speeches, respectively. The experiments results demonstrate the effectiveness of adopting STRF based feature in speaker recognition.

引用

页码：4055 / 4068

页数：13

共 50 条

[21] Temporal modulation normalization for robust speech feature extraction and recognition
Xugang Lu
Shigeki Matsuda
Masashi Unoki
Satoshi Nakamura
Multimedia Tools and Applications, 2011, 52 : 187 - 199
[22] Temporal modulation normalization for robust speech feature extraction and recognition
Lu, Xugang
Matsuda, Shigeki
Unoki, Masashi
Nakamura, Satoshi
PROCEEDINGS OF THE 2009 2ND INTERNATIONAL CONGRESS ON IMAGE AND SIGNAL PROCESSING, VOLS 1-9, 2009, : 4354 - 4357
[23] Speaker Identification Using MFCC Feature Extraction ANN Classification Technique
Singh, Mahesh K.
WIRELESS PERSONAL COMMUNICATIONS, 2024, 136 (01) : 453 - 467
[24] Robust Feature Extraction for Speaker Recognition Based on Constrained Nonnegative Tensor Factorization
Qiang Wu
Li-Qing Zhang
Guang-Chuan Shi
Journal of Computer Science and Technology, 2010, 25 : 783 - 792
[25] Robust Feature Extraction for Speaker Recognition Based on Constrained Nonnegative Tensor Factorization
吴强
张丽清
石光川
Journal of Computer Science & Technology, 2010, 25 (04) : 783 - 792
[26] Robust Feature Extraction for Speaker Recognition Based on Constrained Nonnegative Tensor Factorization
Wu, Qiang
Zhang, Li-Qing
Shi, Guang-Chuan
JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY, 2010, 25 (04) : 783 - 792
[27] Robust feature extraction from spectrum estimated using bispectrum for speaker recognition
Ajmera P.K.
Nehe N.S.
Jadhav D.V.
Holambe R.S.
International Journal of Speech Technology, 2012, 15 (3) : 433 - 440
[28] Robust analysis and weighting on MFCC components for speech recognition and speaker identification
Zhou, Xi
Fu, Yun
Liu, Ming
Hasegawa-Johnson, Mark
Huang, Thomas S.
2007 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, VOLS 1-5, 2007, : 188 - 191
[29] A Discriminative Spectral-Temporal Feature set for Motor Imagery Classification
Abbas, Waseem
Khan, Nadeem Ahmad
2017 IEEE INTERNATIONAL WORKSHOP ON SIGNAL PROCESSING SYSTEMS (SIPS), 2017,
[30] Application of Slope Filtering to Robust Spectral Envelope Extraction for Speech/Speaker Recognition
Drgas, Szymon
Dabrowski, Adam
HUMAN LANGUAGE TECHNOLOGY: CHALLENGES OF THE INFORMATION SOCIETY, 2009, 5603 : 13 - 23

← 1 2 3 4 5 →