A STUDY ON CEPSTRAL SUB-BAND NORMALIZATION FOR ROBUST ASR

被引:0
|
作者
Wang, Syu-Siang [1 ]
Hung, Jeih-Weih [2 ]
Tsao, Yu [1 ]
机构
[1] Acad Sinica, Res Ctr Informat Technol Innovat, Taipei 115, Taiwan
[2] Natl Chi Nan Univ, Dept Elect Engn, Nantou, Taiwan
来源
2012 8TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING | 2012年
关键词
discrete wavelet transform; CMS; CMVN; RASTA; noise robust; speech recognition; SPEECH; NOISE;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we propose a cepstral subband normalization (CSN) approach for robust speech recognition. The CSN approach first applies the discrete wavelet transform (DWT) to decompose the original cepstral feature sequence into low and high frequency band (LFB and HFB) parts. Then, CSN normalizes the LFB components and zeros out the HFB components. Finally, an inverse DWT is applied on LFB and HFB components to form the normalized cepstral features. When using the Haar functions as the DWT bases, the calculation of CSN can be processed efficiently with a 50% reduction on the amount of feature components. In addition, our experimental results on the Aurora-2 task show that CSN outperforms the conventional cepstral mean subtraction (CMS), cepstral mean and variance normalization (CMVN), and histogram equalization (HEQ). We also integrate CSN with advanced front-end (AFE) for feature extraction. Experimental results indicate that the integrated AFE+CSN achieves notable improvements over the original AFE. The simple calculation, compact in form, and effective noise robustness properties enable CSN to perform suitably for mobile applications.
引用
收藏
页码:141 / 145
页数:5
相关论文
共 50 条
  • [21] A robust blind audio watermarking using distribution of sub-band signals
    Choi, Jae-Won
    Chung, Hyun-Yeol
    Jung, Ho-Youl
    MULTIMEDIA CONTENT REPRESENTATION, CLASSIFICATION AND SECURITY, 2006, 4105 : 106 - 113
  • [22] A probabilistic union model for sub-band based robust speech recognition
    Ming, J
    Smith, FJ
    2000 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, PROCEEDINGS, VOLS I-VI, 2000, : 1787 - 1790
  • [23] Modeling sub-band correlation for noise-robust speech recognition
    McAuley, J
    Ming, J
    Hanna, P
    Stewart, D
    2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS: SPEECH PROCESSING, 2004, : 1017 - 1020
  • [24] ROBUST VOICE ACTIVITY DETECTION BASED ON PITCH AND SUB-BAND ENERGY
    Zhang, Zhihao
    Lin, Jinlong
    SIGMAP 2009: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING AND MULTIMEDIA APPLICATIONS, 2009, : 44 - 48
  • [25] Sub-band temporal modulation envelopes and their normalization for automatic speech recognition in reverberant environments
    Lu, Xugang
    Unoki, Masashi
    Nakamura, Satoshi
    COMPUTER SPEECH AND LANGUAGE, 2011, 25 (03): : 571 - 584
  • [26] SUB-BAND CODING.
    Crochiere, Ronald E.
    The Bell System technical journal, 1981, 60 (7 pt 2): : 1633 - 1653
  • [27] A robust front-end processor combining mel frequency cepstral coefficient and sub-band spectral centroid histogram methods for automatic speech recognition
    Department of Information Technology Kongu Engineering College, Perundurai - 638 052, Erode, Tamilnadu State, India
    不详
    Int. J. Signal Process. Image Process. Pattern Recogn., 2008, 2 (67-74):
  • [28] Sub-band speech recognition
    Primor, D
    Furst-Yust, M
    22ND CONVENTION OF ELECTRICAL AND ELECTRONICS ENGINEERS IN ISRAEL, PROCEEDINGS, 2002, : 10 - 12
  • [29] Cepstral gain normalization for noise robust speech recognition
    Yoshizawa, Shingo
    Hayasaka, Noboru
    Wada, Naoya
    Miyanaga, Yoshikazu
    ICASSP IEEE Int Conf Acoust Speech Signal Process Proc, 1600, (I209-I212):
  • [30] Cepstral shape normalization (CSN) for robust speech recognition
    Du, Jun
    Wang, Ren-Hua
    2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12, 2008, : 4389 - 4392