Mean Hilbert envelope coefficients (MHEC) for robust speaker and language identification

被引:53
|
作者
Sadjadi, Seyed Omid [1 ]
Hansen, John H. L. [1 ]
机构
[1] Univ Texas Dallas, Dept Elect Engn, Ctr Robust Speech Syst CRSS, Richardson, TX 75080 USA
基金
美国国家科学基金会;
关键词
Language identification; MHEC; Mismatch conditions; Robust features; Speaker identification; MULTITAPER MFCC; SPEECH; NOISE; CLASSIFICATION; VERIFICATION; MODULATIONS; RECOGNITION; FEATURES;
D O I
10.1016/j.specom.2015.04.005
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Adverse noisy conditions pose great challenges to automatic speech applications including speaker and language identification (SID and LID), where mel-frequency cepstral coefficients (MFCC) are the most commonly adopted acoustic features. Although systems trained using MFCCs provide competitive performance under matched conditions, it is well-known that such systems are susceptible to acoustic mismatch between training and test conditions due to noise and channel degradations. Motivated by this fact, this study proposes an alternative noise-robust acoustic feature front-end that is capable of capturing speaker identity as well as language structure/-content conveyed in the speech signal. Specifically, a feature extraction procedure inspired by the human auditory processing is proposed. The proposed feature is based on the Hilbert envelope of Gammatone filterbank outputs that represent the envelope of the auditory nerve response. The subband amplitude modulations, which are captured through smoothed Hilbert envelopes (a.k.a. temporal envelopes), carry useful acoustic information and have been shown to be robust to signal degradations. Effectiveness of the proposed front-end, which is entitled mean Hilbert envelope coefficients (MHEC), is evaluated in the context of SID and LID tasks using degraded speech material from the DARPA Robust Automatic Transcription of Speech (RATS) program. In addition, we investigate the impact of the dynamic range compression stage in the MHEC feature extraction process on performance using logarithmic and power-law non-linearities. Experimental results indicate that: (i) the MHEC feature is highly effective and performs favorably compared to other conventional and state-of-the-art front-ends, and (ii) the power-law non-linearity consistently yields the best performance across different conditions for both SID and LID tasks. (C) 2015 Elsevier B.V. All rights reserved.
引用
收藏
页码:138 / 148
页数:11
相关论文
共 50 条
  • [21] THE EFFECT OF LANGUAGE FACTORS FOR ROBUST SPEAKER RECOGNITION
    Lu, Liang
    Dong, Yuan
    Zhao, Xianyu
    Liu, Jiqing
    Wang, Haila
    2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS, 2009, : 4217 - +
  • [22] Bionic Cepstral coefficients (BCC): A new auditory feature extraction to noise-robust speaker identification
    Zouhir, Youssef
    Zarka, Mohamed
    Ouni, Kais
    APPLIED ACOUSTICS, 2024, 221
  • [23] Modified Mel-frequency Cepstral Coefficients (MMFCC) in Robust Text-dependent Speaker Identification
    Islam, Md. Atiqul
    2017 4TH INTERNATIONAL CONFERENCE ON ADVANCES IN ELECTRICAL ENGINEERING (ICAEE), 2017, : 505 - 509
  • [24] Speaker Modeling Using Emotional Speech for More Robust Speaker Identification
    M. Milošević
    Ž. Nedeljković
    U. Glavitsch
    Ž. Đurović
    Journal of Communications Technology and Electronics, 2019, 64 : 1256 - 1265
  • [25] Speaker Modeling Using Emotional Speech for More Robust Speaker Identification
    Milosevic, M.
    Nedeljkovic, Z.
    Glavitsch, U.
    Durovic, Z.
    JOURNAL OF COMMUNICATIONS TECHNOLOGY AND ELECTRONICS, 2019, 64 (11) : 1256 - 1265
  • [26] Robust Speaker Identification in Noisy and Reverberant Conditions
    Zhao, Xiaojia
    Wang, Yuxuan
    Wang, DeLiang
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2014, 22 (04) : 836 - 845
  • [27] Noise Robust Speaker Identification by Dividing MFCC
    Matsumoto, Kizuki
    Hayasaka, Noboru
    Iiguni, Youji
    2014 6TH INTERNATIONAL SYMPOSIUM ON COMMUNICATIONS, CONTROL AND SIGNAL PROCESSING (ISCCSP), 2014, : 652 - 655
  • [28] Robust speaker identification in the presence of car noise
    Deshpande, Mangesh S.
    Holambe, Raghunath S.
    INTERNATIONAL JOURNAL OF BIOMETRICS, 2011, 3 (03) : 189 - 205
  • [29] Application of KPCA and PNN for robust speaker identification
    Ren, Xue-Hui
    Zhang, Ya-Fen
    Xing, Yu-Juan
    Li, Ming
    CISP 2008: FIRST INTERNATIONAL CONGRESS ON IMAGE AND SIGNAL PROCESSING, VOL 4, PROCEEDINGS, 2008, : 533 - 536
  • [30] CASA-Based Robust Speaker Identification
    Zhao, Xiaojia
    Shao, Yang
    Wang, DeLiang
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2012, 20 (05): : 1608 - 1616