AMPLITUDE MODULATION SPECTROGRAM BASED FEATURES FOR ROBUST SPEECH RECOGNITION IN NOISY AND REVERBERANT ENVIRONMENTS

被引:0
|
作者
Moritz, Niko [1 ]
Anemueller, Joern [2 ]
Kollmeier, Birger [1 ]
机构
[1] Fraunhofer IDMT Project Grp Hearing Speech & Audi, Oldenburg, Germany
[2] Carl von Ossietzky Univ Oldenburg, Med Phys, Dept Phys, Oldenburg, Germany
关键词
Amplitude Modulation Spectrogram (AMS); Feature Extraction; Reverberation; Phase; Automatic Speech Recognition (ASR); RECEPTION; SPECTRUM;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In this contribution we present a feature extraction method that relies on the modulation-spectral analysis of amplitude fluctuations within sub-bands of the acoustic spectrum by a STFT. The experimental results indicate that the optimal temporal filter extension for amplitude modulation analysis is around 310 ms. It is also demonstrated that the phase information of the modulation spectrum contains important cues for speech recognition. In this context, the advantage of an odd analysis basis function is considered. The best presented features reached a total relative improvement of 53,5 % for clean-condition training on Aurora-2. Furthermore, it is shown that modulation features are more robust against room reverberation than conventional cepstral and dynamic features and that they strongly benefit from a high early-to-late energy ratio of the characteristic RIR.
引用
收藏
页码:5492 / 5495
页数:4
相关论文
共 50 条
  • [21] Robust front-end for speech recognition by human and machine in noisy reverberant environments: the effect of phase information
    Liu, Yang
    Nower, Naushin
    Morita, Shota
    Unoki, Masashi
    2016 10TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2016,
  • [22] Robust Voice Activity Detection Based on Concept of Modulation Transfer Function in Noisy Reverberant Environments
    Morita, Shota
    Unoki, Masashi
    Lu, Xugang
    Akagi, Masato
    2014 9TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2014, : 108 - +
  • [23] Enhancement of Reverberant Speech in Noisy Acoustical Environments
    Joorabchi, Marjan
    Ghorshi, Seyed
    Sarafnia, Ali
    2014 SIXTH INTERNATIONAL CONFERENCE ON WIRELESS COMMUNICATIONS AND SIGNAL PROCESSING (WCSP), 2014,
  • [24] A systematic study of DNN based speech enhancement in reverberant and reverberant-noisy environments
    Wang, Heming
    Pandey, Ashutosh
    Wang, Deliang
    COMPUTER SPEECH AND LANGUAGE, 2025, 89
  • [25] Contribution of modulation spectral features for cross-lingual speech emotion recognition under noisy reverberant conditions
    Guo, Taiyang
    Li, Sixia
    Kidani, Shunsuke
    Okada, Shogo
    Unoki, Masashi
    2023 ASIA PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE, APSIPA ASC, 2023, : 2221 - 2227
  • [26] Perceptual features for automatic speech recognition in noisy environments
    Haque, Serajul
    Togneri, Roberto
    Zaknich, Anthony
    SPEECH COMMUNICATION, 2009, 51 (01) : 58 - 75
  • [27] Multiband, Multisensor Robust Features for Noisy Speech Recognition
    Dimitriadis, Dimitrios
    Maragos, Petros
    Lefkimmiatis, Stamatios
    INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 889 - 892
  • [28] Modulation frequency features for phoneme recognition in noisy speech
    Ganapathy, Sriram
    Thomas, Samuel
    Hermansky, Hynek
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2009, 125 (01): : EL8 - EL12
  • [29] Robust speech recognition in noisy environments based on subband spectral centroid histograms
    Gajic, B
    Paliwal, KK
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2006, 14 (02): : 600 - 608
  • [30] Robust Front End Processing for Speech Recognition in Reverberant Environments: Utilization of Speech Characteristics
    Petrick, Rico
    Lu, Xugang
    Unoki, Masashi
    Akagi, Masato
    Hoffmann, Ruediger
    INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 658 - +