Improved Frequency Modulation Features for Multichannel Distant Speech Recognition

被引:5
|
作者
Rodomagoulakis, Isidoros [1 ]
Maragos, Petros [1 ]
机构
[1] Natl Tech Univ Athens, Sch Elect & Comp Engn, Athens 15773, Greece
关键词
Frequency modulation features; demodulation; deep bottleneck features; distant speech recognition; FEATURE-EXTRACTION; ENERGY; TRACKING;
D O I
10.1109/JSTSP.2019.2923372
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Frequency modulation features capture the fine structure of speech formants that constitute beneficial to the traditional energy-based cepstral features by carrying supplementary information. Improvements have been demonstrated mainly in Gaussian mixture model (GMM)-hidden Markov model (HMM) systems for small and large vocabulary tasks. Yet, they have limited applications in deep neural network (DNN)-HMM systems and distant speech recognition (DSR) tasks. Herein, we elaborate on their integration within state-of-the-art front-end schemes that include post-processing of MFCCs resulting in discriminant and speaker-adapted features of large temporal contexts. We explore: 1) multichannel demodulation schemes for multi-microphone setups; 2) richer descriptors of frequency modulations; and 3) feature transformation and combination via hierarchical deep networks. We present results for tandem and hybrid recognition with GMM and DNN acoustic models, respectively. The improved modulation features are combined efficiently with MFCCs yielding modest and consistent improvements in multichannel DSR tasks on reverberant and noisy environments, where recognition rates are far from human performance.
引用
收藏
页码:841 / 849
页数:9
相关论文
共 50 条
  • [11] Improved HMM separation for distant-talking speech recognition
    Takiguchi, T
    Nishimura, M
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2004, E87D (05): : 1127 - 1137
  • [12] Contribution of frequency modulation to speech recognition in noise
    Stickney, GS
    Nie, KB
    Zeng, FG
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2005, 118 (04): : 2412 - 2420
  • [13] Amplitude Modulation Features for Emotion Recognition from Speech
    Alam, Md Jahangir
    Attabi, Yazid
    Dumouchel, Pierre
    Kenny, Patrick
    O'Shaughnessy, D.
    14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 2419 - 2423
  • [14] Multistream Bandpass Modulation Features for Robust Speech Recognition
    Nemala, Sridhar Krishna
    Patil, Kailash
    Elhilali, Mounya
    12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 1284 - 1287
  • [15] Modulation Spectrum Equalization for Improved Robust Speech Recognition
    Sun, Liang-Che
    Lee, Lin-Shan
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2012, 20 (03): : 828 - 843
  • [16] Deep Autoencoder based Speech Features for Improved Dysarthric Speech Recognition
    Vachhani, Bhavik
    Bhat, Chitralekha
    Das, Biswajit
    Kopparapu, Sunil Kumar
    18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 1854 - 1858
  • [17] MULTICHANNEL FEATURE ENHANCEMENT IN DISTRIBUTED MICROPHONE ARRAYS FOR ROBUST DISTANT SPEECH RECOGNITION IN SMART ROOMS
    Mirsamadi, Seyedmahdad
    Hansen, John H. L.
    2014 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY SLT 2014, 2014, : 507 - 512
  • [18] Dimensionality Reduction of Modulation Frequency Features for Speech Discrimination
    Markaki, Maria
    Stylianou, Yannis
    INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 646 - 649
  • [19] Deep Learning of Speech Features for Improved Phonetic Recognition
    Lee, Jaehyung
    Lee, Soo-Young
    12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 1256 - 1259
  • [20] Channel Selection in the Short-time Modulation Domain for Distant Speech Recognition
    Himawan, Ivan
    Motlicek, Petr
    Sridharan, Sridha
    Dean, David
    Tjondronegoro, Dian
    16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 741 - 745