Improved Frequency Modulation Features for Multichannel Distant Speech Recognition

被引:5
|
作者
Rodomagoulakis, Isidoros [1 ]
Maragos, Petros [1 ]
机构
[1] Natl Tech Univ Athens, Sch Elect & Comp Engn, Athens 15773, Greece
关键词
Frequency modulation features; demodulation; deep bottleneck features; distant speech recognition; FEATURE-EXTRACTION; ENERGY; TRACKING;
D O I
10.1109/JSTSP.2019.2923372
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Frequency modulation features capture the fine structure of speech formants that constitute beneficial to the traditional energy-based cepstral features by carrying supplementary information. Improvements have been demonstrated mainly in Gaussian mixture model (GMM)-hidden Markov model (HMM) systems for small and large vocabulary tasks. Yet, they have limited applications in deep neural network (DNN)-HMM systems and distant speech recognition (DSR) tasks. Herein, we elaborate on their integration within state-of-the-art front-end schemes that include post-processing of MFCCs resulting in discriminant and speaker-adapted features of large temporal contexts. We explore: 1) multichannel demodulation schemes for multi-microphone setups; 2) richer descriptors of frequency modulations; and 3) feature transformation and combination via hierarchical deep networks. We present results for tandem and hybrid recognition with GMM and DNN acoustic models, respectively. The improved modulation features are combined efficiently with MFCCs yielding modest and consistent improvements in multichannel DSR tasks on reverberant and noisy environments, where recognition rates are far from human performance.
引用
收藏
页码:841 / 849
页数:9
相关论文
共 50 条
  • [31] On distant speech recognition for home automation
    Vacher, Michel
    Lecouteux, Benjamin
    Portet, François
    Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2015, 8700 : 161 - 188
  • [32] Frequency-warping invariant features for automatic speech recognition
    Mertins, Alfred
    Rademacher, Jan
    2006 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-13, 2006, : 5883 - 5886
  • [33] Improved speech emotion recognition with Mel frequency magnitude coefficient
    Ancilin, J.
    Milton, A.
    APPLIED ACOUSTICS, 2021, 179
  • [34] PHONEME RECOGNITION USING SPECTRAL ENVELOPE AND MODULATION FREQUENCY FEATURES
    Thomas, Samuel
    Ganapathy, Sriram
    Hermansky, Hynek
    2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS, 2009, : 4453 - +
  • [35] FREQUENCY MODULATION OF SPEECH
    SILBIGER, HR
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1964, 36 (10): : 2001 - &
  • [36] IMPROVED TONE MODELING BY EXPLOITING ARTICULATORY FEATURES FOR MANDARIN SPEECH RECOGNITION
    Chao, Hao
    Yang, Zhanlei
    Liu, Wenju
    2012 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2012, : 4741 - 4744
  • [37] Model-based Articulatory Phonetic Features for Improved Speech Recognition
    Huang, Guangpu
    Er, Meng Joo
    2012 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2012,
  • [38] A Study of Bootstrapping with Multiple Acoustic Features for Improved Automatic Speech Recognition
    Cui, Xiaodong
    Xue, Jian
    Xiang, Bing
    Zhou, Bowen
    INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 248 - 251
  • [39] SPEECH RECOGNITION EXPERIENCE WITH MULTICHANNEL COCHLEAR IMPLANTS
    PARKIN, JL
    EDDINGTON, DK
    ORTH, JL
    BRACKMANN, DE
    OTOLARYNGOLOGY-HEAD AND NECK SURGERY, 1984, : 33 - 33
  • [40] Adaptive Multichannel Dereverberation for Automatic Speech Recognition
    Caroselli, Joe
    Shafran, Izhak
    Narayanan, Arun
    Rose, Richard
    18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 3877 - 3881