Spectral-temporal receptive fields and MFCC balanced feature extraction for robust speaker recognition

被引:1
|
作者
Jia-Ching Wang
Chien-Yao Wang
Yu-Hao Chin
Yu-Ting Liu
En-Ting Chen
Pao-Chi Chang
机构
[1] National Central University,Department of Computer Science and Information Engineering
[2] National Central University,Department of Communication Engineering
来源
关键词
STRF; Speaker recognition; Feature extraction; Speaker authentication;
D O I
暂无
中图分类号
学科分类号
摘要
This paper proposes a speaker recognition system using acoustic features that are based on spectral-temporal receptive fields (STRFs). The STRF is derived from physiological models of the mammalian auditory system in the spectral-temporal domain. With the STRF, a signal is expressed by rate (in Hz) and scale (in cycles/octaves). The rate and scale are used to specify the temporal response and spectral response, respectively. This paper uses the proposed STRF based feature to perform speaker recognition. First, the energy of each scale is calculated using the STRF representation. A logarithmic operation is then applied to the scale energies. Finally, a discrete cosine transform is utilized to the generation of the proposed STRF feature. This paper also presents a feature set that combines the proposed STRF feature with conventional Mel frequency cepstral coefficients (MFCCs). The support vector machines (SVMs) are adopted to be the speaker classifiers. To evaluate the performance of the proposed speaker recognition system, experiments on 36-speaker recognition were conducted. Comparing with the MFCC baseline, the proposed feature set increases the speaker recognition rates by 3.85 % and 18.49 % on clean and noisy speeches, respectively. The experiments results demonstrate the effectiveness of adopting STRF based feature in speaker recognition.
引用
收藏
页码:4055 / 4068
页数:13
相关论文
共 50 条
  • [1] Spectral-temporal receptive fields and MFCC balanced feature extraction for robust speaker recognition
    Wang, Jia-Ching
    Wang, Chien-Yao
    Chin, Yu-Hao
    Liu, Yu-Ting
    Chen, En-Ting
    Chang, Pao-Chi
    MULTIMEDIA TOOLS AND APPLICATIONS, 2017, 76 (03) : 4055 - 4068
  • [2] Spectral-Temporal Receptive Fields and MFCC Balanced Feature Extraction for Noisy Speech Recognition
    Wang, Jia-Ching
    Lin, Chang-Hong
    Chen, En-Ting
    Chang, Pao-Chi
    2014 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA), 2014,
  • [3] A Modified MFCC Feature Extraction Technique For Robust Speaker Recognition
    Sharma, Diksha
    Ali, Israj
    2015 INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING, COMMUNICATIONS AND INFORMATICS (ICACCI), 2015, : 1052 - 1057
  • [4] Multitaper Based MFCC Feature Extraction for Robust Speaker Recognition System
    Bharath, K. P.
    Kumar, Rajesh M.
    2019 INNOVATIONS IN POWER AND ADVANCED COMPUTING TECHNOLOGIES (I-PACT), 2019,
  • [5] The Research of Feature Extraction Based on MFCC for Speaker Recognition
    Zhang Wanli
    Li Guoxin
    2013 3RD INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND NETWORK TECHNOLOGY (ICCSNT), 2013, : 1074 - 1077
  • [6] Exploration of Feature Reduction of MFCC Spectral Features in Speaker Recognition
    Kumar, Mohit
    Katti, Sachin
    Das, Pradip K.
    ADVANCED COMPUTING AND COMMUNICATION TECHNOLOGIES, 2016, 452 : 151 - 159
  • [7] Hardware Implementation of MFCC-Based Feature Extraction for Speaker Recognition
    Ehkan, P.
    Zakaria, F. F.
    Warip, M. N. M.
    Sauli, Z.
    Elshaikh, M.
    ADVANCED COMPUTER AND COMMUNICATION ENGINEERING TECHNOLOGY, 2015, 315 : 471 - 480
  • [8] An Auditory Feature Extraction Method for Robust Speaker Recognition
    Hu, Fengsong
    Cao, Xiaoyu
    PROCEEDINGS OF 2012 IEEE 14TH INTERNATIONAL CONFERENCE ON COMMUNICATION TECHNOLOGY, 2012, : 1067 - 1071
  • [9] Feature Extraction from Temporal Phase for Speaker Recognition
    Gandhi, Ami
    Patil, Hemant A.
    2018 INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING AND COMMUNICATIONS (SPCOM 2018), 2018, : 382 - 386
  • [10] Spectral-temporal receptive fields of nonlinear auditory neurons obtained using natural sounds
    Theunissen, FE
    Sen, K
    Doupe, AJ
    JOURNAL OF NEUROSCIENCE, 2000, 20 (06): : 2315 - 2331