Improved Frequency Modulation Features for Multichannel Distant Speech Recognition

被引:5
|
作者
Rodomagoulakis, Isidoros [1 ]
Maragos, Petros [1 ]
机构
[1] Natl Tech Univ Athens, Sch Elect & Comp Engn, Athens 15773, Greece
关键词
Frequency modulation features; demodulation; deep bottleneck features; distant speech recognition; FEATURE-EXTRACTION; ENERGY; TRACKING;
D O I
10.1109/JSTSP.2019.2923372
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Frequency modulation features capture the fine structure of speech formants that constitute beneficial to the traditional energy-based cepstral features by carrying supplementary information. Improvements have been demonstrated mainly in Gaussian mixture model (GMM)-hidden Markov model (HMM) systems for small and large vocabulary tasks. Yet, they have limited applications in deep neural network (DNN)-HMM systems and distant speech recognition (DSR) tasks. Herein, we elaborate on their integration within state-of-the-art front-end schemes that include post-processing of MFCCs resulting in discriminant and speaker-adapted features of large temporal contexts. We explore: 1) multichannel demodulation schemes for multi-microphone setups; 2) richer descriptors of frequency modulations; and 3) feature transformation and combination via hierarchical deep networks. We present results for tandem and hybrid recognition with GMM and DNN acoustic models, respectively. The improved modulation features are combined efficiently with MFCCs yielding modest and consistent improvements in multichannel DSR tasks on reverberant and noisy environments, where recognition rates are far from human performance.
引用
收藏
页码:841 / 849
页数:9
相关论文
共 50 条
  • [41] Robust distant-talking speech recognition
    Lin, Q
    Che, C
    Yuk, DS
    Jin, L
    deVries, B
    Pearson, J
    Flanagan, J
    1996 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, CONFERENCE PROCEEDINGS, VOLS 1-6, 1996, : 21 - 24
  • [42] A DIGITAL MICROPHONE ARRAY FOR DISTANT SPEECH RECOGNITION
    Zwyssig, Erich
    Lincoln, Mike
    Renals, Steve
    2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, : 5106 - 5109
  • [43] VERSATILE VECTOR PROCESSOR FOR MULTICHANNEL SPEECH RECOGNITION
    OSBORN, RR
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1979, 65 : S132 - S132
  • [44] Convolutional Neural Networks for Distant Speech Recognition
    Swietojanski, Pawel
    Ghoshal, Arnab
    Renals, Steve
    IEEE SIGNAL PROCESSING LETTERS, 2014, 21 (09) : 1120 - 1124
  • [45] SPEECH RECOGNITION EXPERIENCE WITH MULTICHANNEL COCHLEAR IMPLANTS
    PARKIN, JL
    EDDINGTON, DK
    ORTH, JL
    BRACKMANN, DE
    OTOLARYNGOLOGY-HEAD AND NECK SURGERY, 1985, 93 (05) : 639 - 645
  • [46] Multichannel End-to-end Speech Recognition
    Ochiai, Tsubasa
    Watanabe, Shinji
    Hori, Takaaki
    Hershey, John R.
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 70, 2017, 70
  • [47] Microphone Array Processing for Distant Speech Recognition
    Kumatani, Kenichi
    McDonough, John
    Raj, Bhiksha
    IEEE SIGNAL PROCESSING MAGAZINE, 2012, 29 (06) : 127 - 140
  • [48] Strategies for distant speech recognition in reverberant environments
    Delcroix, Marc
    Yoshioka, Takuya
    Ogawa, Atsunori
    Kubo, Yotaro
    Fujimoto, Masakiyo
    Ito, Nobutaka
    Kinoshita, Keisuke
    Espi, Miquel
    Araki, Shoko
    Hori, Takaaki
    Nakatani, Tomohiro
    EURASIP JOURNAL ON ADVANCES IN SIGNAL PROCESSING, 2015,
  • [49] Learning to Rank Microphones for Distant Speech Recognition
    Cornell, Samuele
    Brutti, Alessio
    Matassoni, Marco
    Squartini, Stefano
    INTERSPEECH 2021, 2021, : 3855 - 3859
  • [50] Short-time Instantaneous Frequency and Bandwidth Features for Speech Recognition
    Tsiakoulis, Pirros
    Potamianos, Alexandros
    Dimitriadis, Dimitrios
    2009 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION & UNDERSTANDING (ASRU 2009), 2009, : 103 - +