Mean Hilbert envelope coefficients (MHEC) for robust speaker and language identification

被引:53
|
作者
Sadjadi, Seyed Omid [1 ]
Hansen, John H. L. [1 ]
机构
[1] Univ Texas Dallas, Dept Elect Engn, Ctr Robust Speech Syst CRSS, Richardson, TX 75080 USA
基金
美国国家科学基金会;
关键词
Language identification; MHEC; Mismatch conditions; Robust features; Speaker identification; MULTITAPER MFCC; SPEECH; NOISE; CLASSIFICATION; VERIFICATION; MODULATIONS; RECOGNITION; FEATURES;
D O I
10.1016/j.specom.2015.04.005
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Adverse noisy conditions pose great challenges to automatic speech applications including speaker and language identification (SID and LID), where mel-frequency cepstral coefficients (MFCC) are the most commonly adopted acoustic features. Although systems trained using MFCCs provide competitive performance under matched conditions, it is well-known that such systems are susceptible to acoustic mismatch between training and test conditions due to noise and channel degradations. Motivated by this fact, this study proposes an alternative noise-robust acoustic feature front-end that is capable of capturing speaker identity as well as language structure/-content conveyed in the speech signal. Specifically, a feature extraction procedure inspired by the human auditory processing is proposed. The proposed feature is based on the Hilbert envelope of Gammatone filterbank outputs that represent the envelope of the auditory nerve response. The subband amplitude modulations, which are captured through smoothed Hilbert envelopes (a.k.a. temporal envelopes), carry useful acoustic information and have been shown to be robust to signal degradations. Effectiveness of the proposed front-end, which is entitled mean Hilbert envelope coefficients (MHEC), is evaluated in the context of SID and LID tasks using degraded speech material from the DARPA Robust Automatic Transcription of Speech (RATS) program. In addition, we investigate the impact of the dynamic range compression stage in the MHEC feature extraction process on performance using logarithmic and power-law non-linearities. Experimental results indicate that: (i) the MHEC feature is highly effective and performs favorably compared to other conventional and state-of-the-art front-ends, and (ii) the power-law non-linearity consistently yields the best performance across different conditions for both SID and LID tasks. (C) 2015 Elsevier B.V. All rights reserved.
引用
下载
收藏
页码:138 / 148
页数:11
相关论文
共 50 条
  • [1] Mean Hilbert Envelope Coefficients (MHEC) for Robust Speaker Recognition
    Sadjadi, Seyed Omid
    Hasan, Taufiq
    Hansen, John H. L.
    13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 1694 - 1697
  • [2] Multitaper chirp group delay Hilbert envelope coefficients for robust speaker verification
    Krobba, Ahmed
    Debyeche, Mohamed
    Selouani, Sid-Ahmed
    MULTIMEDIA TOOLS AND APPLICATIONS, 2019, 78 (14) : 19525 - 19542
  • [3] Multitaper chirp group delay Hilbert envelope coefficients for robust speaker verification
    Ahmed Krobba
    Mohamed Debyeche
    Sid-Ahmed Selouani
    Multimedia Tools and Applications, 2019, 78 : 19525 - 19542
  • [4] Low-Resource Dialect Identification in Ao Using Noise Robust Mean Hilbert Envelope Coefficients
    Tzudir, Moakala
    Bhattacharjee, Mrinmoy
    Sarmah, Priyankoo
    Prasanna, S. R. M.
    2022 NATIONAL CONFERENCE ON COMMUNICATIONS (NCC), 2022, : 256 - 261
  • [5] HILBERT ENVELOPE BASED FEATURES FOR ROBUST SPEAKER IDENTIFICATION UNDER REVERBERANT MISMATCHED CONDITIONS
    Sadjadi, Seyed Omid
    Hansen, John H. L.
    2011 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2011, : 5448 - 5451
  • [6] Robust Speaker Verification System in Acoustic Noise Mobile by using Multitaper Gammaton Hilbert Envelope Coefficients
    Krobba, Ahmed
    Debyeche, Mohamed
    Selouani, Sid Ahmed
    2018 2ND INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE AND SPEECH PROCESSING (ICNLSP), 2018, : 9 - 14
  • [7] Stationary wavelet Filtering Cepstral coefficients (SWFCC) for robust speaker identification
    Missaoui, Ibrahim
    Lachiri, Zied
    Applied Acoustics, 2025, 231
  • [8] LANGUAGE IDENTIFICATION USING HILBERT ENVELOPE AND PHASE INFORMATION OF LINEAR PREDICTION RESIDUAL
    Nandi, Dipanjan
    Pati, Debadatta
    Rao, K. Sreenivasa
    2013 INTERNATIONAL CONFERENCE ORIENTAL COCOSDA HELD JOINTLY WITH 2013 CONFERENCE ON ASIAN SPOKEN LANGUAGE RESEARCH AND EVALUATION (O-COCOSDA/CASLRE), 2013,
  • [9] Robust Language Identification using Power Normalized Cepstral Coefficients
    Dutta, Arup Kumar
    Rao, K. Sreenivasa
    2015 EIGHTH INTERNATIONAL CONFERENCE ON CONTEMPORARY COMPUTING (IC3), 2015, : 253 - 256
  • [10] Robust speaker identification and verification
    Wang, Jia-Ching
    Yang, Chung-Hsien
    Wang, Jhing-Fa
    Lee, Hsiao-Ping
    IEEE COMPUTATIONAL INTELLIGENCE MAGAZINE, 2007, 2 (02) : 52 - 59