Normalizing the speech modulation spectrum for robust speech recognition

被引:0
|
作者
Xiao, Xiong [1 ,2 ]
Chng, Eng Siong [1 ]
Li, Haizhou [1 ,2 ]
机构
[1] Nanyang Technol Univ, Sch Comp Engn, Singapore, Singapore
[2] Inst Infocomm Res, Singapore, Singapore
关键词
speech recognition; feature normalization; modulation spectrum; square-root Wiener filter; temporal filter;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This paper presents a novel feature normalization technique for robust speech recognition. The proposed technique normalizes the temporal structure of the feature to reduce the feature variation due to environmental interferences. Specifically, it normalizes the utterance-dependent feature modulation spectrum to a reference function by filtering the feature using a square-root Wiener filter in the temporal domain. We show experimentally that the proposed technique when combined with mean and variance normalization technique (MVN) reduces the word error rate significantly on the AURORA-2 task, with relative error rate reduction 69.11% compared to the base me.
引用
收藏
页码:1021 / +
页数:2
相关论文
共 50 条
  • [21] Spectrum filtering with FRM for robust speech recognition
    Hayasaka, Noboru
    Miyanaga, Yoshikazu
    [J]. 2006 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, VOLS 1-11, PROCEEDINGS, 2006, : 3285 - +
  • [22] Modulation spectrum analysis for recognition of reverberant speech
    Mallidi, Sri Harish
    Ganapathy, Sriram
    Hermansky, Hynek
    [J]. 12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 196 - 199
  • [23] Static and Dynamic Modulation Spectrum for Speech Recognition
    Ganapathy, Sriram
    Thomas, Samuel
    Hermansky, Hynek
    [J]. INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 2791 - 2794
  • [24] Speech feature extraction based on wavelet modulation scale for robust speech recognition
    Ma, Xin
    Zhou, Weidong
    Ju, Fang
    Jiang, Qi
    [J]. NEURAL INFORMATION PROCESSING, PT 2, PROCEEDINGS, 2006, 4233 : 499 - 505
  • [25] Overlapped sub-band modulation spectrum normalization techniques for robust speech recognition
    Fan, Hao-teng
    Yeh, Wei-jeih
    Hung, Jeih-weih
    [J]. 2013 10TH INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY (FSKD), 2013, : 1035 - 1039
  • [26] Quality-Aware Bag of Modulation Spectrum Features for Robust Speech Emotion Recognition
    Kshirsagar, Shruti Rajendra
    Falk, Tiago Henrik
    [J]. IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2022, 13 (04) : 1892 - 1905
  • [27] Direct control on modulation spectrum for noise-robust speech recognition and spectral subtraction
    Wada, Naoya
    Hayasaka, Noboru
    Yoshizawa, Shingo
    Miyanaga, Yoshikazu
    [J]. 2006 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, VOLS 1-11, PROCEEDINGS, 2006, : 2533 - +
  • [28] A robust speech analysis in speech recognition
    Miyanaga, Y
    Gozen, S
    Ohtsuki, N
    [J]. 2000 5TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING PROCEEDINGS, VOLS I-III, 2000, : 706 - 709
  • [29] MINIMUM VARIANCE MODULATION FILTER FOR ROBUST SPEECH RECOGNITION
    Chiu, Yu-Hsiang Bosco
    Stern, Richard M.
    [J]. 2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS, 2009, : 3917 - +
  • [30] Temporal Modulation Spectral Restoration for Robust Speech Recognition
    Wang, Svu-Siang
    Tsao, Yu
    [J]. 2016 IEEE SECOND INTERNATIONAL CONFERENCE ON MULTIMEDIA BIG DATA (BIGMM), 2016, : 481 - 486