Improved Frequency Modulation Features for Multichannel Distant Speech Recognition

被引:5
|
作者
Rodomagoulakis, Isidoros [1 ]
Maragos, Petros [1 ]
机构
[1] Natl Tech Univ Athens, Sch Elect & Comp Engn, Athens 15773, Greece
关键词
Frequency modulation features; demodulation; deep bottleneck features; distant speech recognition; FEATURE-EXTRACTION; ENERGY; TRACKING;
D O I
10.1109/JSTSP.2019.2923372
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Frequency modulation features capture the fine structure of speech formants that constitute beneficial to the traditional energy-based cepstral features by carrying supplementary information. Improvements have been demonstrated mainly in Gaussian mixture model (GMM)-hidden Markov model (HMM) systems for small and large vocabulary tasks. Yet, they have limited applications in deep neural network (DNN)-HMM systems and distant speech recognition (DSR) tasks. Herein, we elaborate on their integration within state-of-the-art front-end schemes that include post-processing of MFCCs resulting in discriminant and speaker-adapted features of large temporal contexts. We explore: 1) multichannel demodulation schemes for multi-microphone setups; 2) richer descriptors of frequency modulations; and 3) feature transformation and combination via hierarchical deep networks. We present results for tandem and hybrid recognition with GMM and DNN acoustic models, respectively. The improved modulation features are combined efficiently with MFCCs yielding modest and consistent improvements in multichannel DSR tasks on reverberant and noisy environments, where recognition rates are far from human performance.
引用
收藏
页码:841 / 849
页数:9
相关论文
共 50 条
  • [1] Modulation frequency features for phoneme recognition in noisy speech
    Ganapathy, Sriram
    Thomas, Samuel
    Hermansky, Hynek
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2009, 125 (01): : EL8 - EL12
  • [2] Modulation features for speech recognition
    Dimitriadis, D
    Maragos, P
    Potamianos, L
    2002 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-IV, PROCEEDINGS, 2002, : 377 - 380
  • [3] MC-Whisper: Extending Speech Foundation Models to Multichannel Distant Speech Recognition
    Chang, Xuankai
    Guo, Pengcheng
    Fujita, Yuya
    Maekaku, Takashi
    Watanabe, Shinji
    IEEE SIGNAL PROCESSING LETTERS, 2024, 31 : 2850 - 2854
  • [4] AN INVESTIGATION INTO INSTANTANEOUS FREQUENCY ESTIMATION METHODS FOR IMPROVED SPEECH RECOGNITION FEATURES
    Nayak, Shekhar
    Bhati, Saurabhchand
    Murty, K. Sri Rama
    2017 IEEE GLOBAL CONFERENCE ON SIGNAL AND INFORMATION PROCESSING (GLOBALSIP 2017), 2017, : 363 - 367
  • [5] HYBRID ACOUSTIC MODELS FOR DISTANT AND MULTICHANNEL LARGE VOCABULARY SPEECH RECOGNITION
    Swietojanski, Pawel
    Ghoshal, Arnab
    Renals, Steve
    2013 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING (ASRU), 2013, : 285 - 290
  • [6] Feature Pooling of Modulation Spectrum Features for Improved Speech Emotion Recognition in the Wild
    Avila, Anderson R.
    Akhtar, Zahid
    Santos, Joao F.
    O'Shaughnessy, Douglas
    Falk, Tiago H.
    IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2021, 12 (01) : 177 - 188
  • [7] Constructing modulation frequency domain-based features for robust speech recognition
    Hung, Jeih-Weih
    Tsai, Wei-Yi
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2008, 16 (03): : 563 - 577
  • [8] On the Improvement of Modulation Features Using Multi-Microphone Energy Tracking for Robust Distant Speech Recognition
    Rodomagoulakis, Isidoros
    Maragos, Petros
    2017 25TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2017, : 558 - 562
  • [9] Modulation and chaotic acoustic features for speech recognition
    Dimitriadis, D.
    Maragos, P.
    Pitsikalis, V.
    Potamianos, A.
    Control and Intelligent Systems, 2002, 30 (01) : 19 - 26
  • [10] Optimization of Temporal Filters in the Modulation Frequency Domain for Constructing Robust Features in Speech Recognition
    Hung, Jeih-weih
    INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 489 - 492