Robust Speaker Recognition Using Spectro-Temporal Autoregressive Models

被引:0
|
作者
Mallidi, Sri Harish [1 ]
Ganapathy, Sriram [2 ]
Hermansky, Hynek [1 ]
机构
[1] Johns Hopkins Univ, Ctr Language & Speech Proc, Baltimore, MD 21218 USA
[2] IBM TJ Watson Res Ctr, Yorktown Hts, NY USA
关键词
Rate-Scale Filtering; Autoregressive Modeling; Speaker Recognition; Robust Feature Extraction;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Speaker recognition in noisy environments is challenging when there is a mis-match in the data used for enrollment and verification. In this paper, we propose a robust feature extraction scheme based on spectro-temporal modulation filtering using two-dimensional (2-D) autoregressive (AR) models. The first step is the AR modeling of the sub-band temporal envelopes by the application of the linear prediction on the sub-band discrete cosine transform (DCT) components. These sub-band envelopes are stacked together and used for a second AR modeling step. The spectral envelope across the sub-bands is approximated in this AR model and cepstral features are derived which are used for speaker recognition. The use of AR models emphasizes the focus on the high energy regions which are relatively well preserved in the presence of noise. The degree of modulation filtering is controlled using AR model order parameter. Experiments are performed using noisy versions of NIST 2010 speaker recognition evaluation (SRE) data with a state of -art speaker recognition system. In these experiments, the proposed features provide significant improvements compared to baseline features (relative improvements of 20% in terms of equal error rate (EER) and 35 % in terms of miss rate at 10 % false alarm).
引用
收藏
页码:3656 / 3660
页数:5
相关论文
共 50 条
  • [1] ROBUST SPECTRO-TEMPORAL FEATURES BASED ON AUTOREGRESSIVE MODELS OF HILBERT ENVELOPES
    Ganapathy, Sriram
    Thomas, Samuel
    Hermansky, Hynek
    [J]. 2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, : 4286 - 4289
  • [2] SPECTRO-TEMPORAL GABOR FEATURES FOR SPEAKER RECOGNITION
    Lei, Howard
    Meyer, Bernd T.
    Mirghafori, Nikki
    [J]. 2012 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2012, : 4241 - 4244
  • [3] Hierarchical spectro-temporal features for robust speech recognition
    Domont, Xavier
    Heckmann, Martin
    Joublin, Frank
    Goerick, Christian
    [J]. 2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12, 2008, : 4417 - 4420
  • [4] Spectro-Temporal Modulations for Robust Speech Emotion Recognition
    Yeh, Lan-Ying
    Chi, Tai-Shih
    [J]. 11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 1-2, 2010, : 789 - 792
  • [5] Spectro-temporal modulation energy based mask for robust speaker identification
    Chi, Tai-Shih
    Lin, Ting-Han
    Hsu, Chung-Chien
    [J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2012, 131 (05): : EL368 - EL374
  • [6] Spectro-Temporal Features for Robust Far-Field Speaker Identification
    Falk, Tiago H.
    Chan, Wai-Yip
    [J]. INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 634 - 637
  • [7] Speaker Adaptation Using Spectro-Temporal Deep Features for Dysarthric and Elderly Speech Recognition
    Geng, Mengzhe
    Xie, Xurong
    Ye, Zi
    Wang, Tianzi
    Li, Guinan
    Hu, Shujie
    Liu, Xunying
    Meng, Helen
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2022, 30 : 2597 - 2611
  • [8] Feature Adaptation Using Linear Spectro-Temporal Transform for Robust Speech Recognition
    Duc Hoang Ha Nguyen
    Xiao, Xiong
    Chng, Eng Siong
    Li, Haizhou
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2016, 24 (06) : 1006 - 1019
  • [9] Robust emotion recognition by spectro-temporal modulation statistic features
    Tai-Shih Chi
    Lan-Ying Yeh
    Chin-Cheng Hsu
    [J]. Journal of Ambient Intelligence and Humanized Computing, 2012, 3 : 47 - 60
  • [10] Robust emotion recognition by spectro-temporal modulation statistic features
    Chi, Tai-Shih
    Yeh, Lan-Ying
    Hsu, Chin-Cheng
    [J]. JOURNAL OF AMBIENT INTELLIGENCE AND HUMANIZED COMPUTING, 2012, 3 (01) : 47 - 60