Powered Cepstral Normalization (P-CN) for Robust Features in Speech Recognition

被引:0
|
作者
Hsu, Chang-wen [1 ]
Lee, Lin-shan [1 ]
机构
[1] Natl Taiwan Univ, Grad Inst Commun Engn, Taipei, Taiwan
关键词
Robust speech recognition; cepstral normalization; cepstral mean and variance normalization;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Cepstral normalization has been popularly used as a powerful approach to produce robust features for speech recognition. Good examples of approaches in this family include the well known Cepstral Mean Subtraction (CMS) and Cepstral Mean and Variance Normalization (CMVN), in which either the first or both the first and the second moments of the Mel-frequency Cepstral Coefficients (MFCCs) are normalized. In this paper, an improved approach of Powered Cepstral Normalization (P-CN) is proposed to normalize the MFCC parameters in the r-th powered domain, where r > 1.0. The basic idea is that when the MFCC parameters are raised to the r-th power, the harmful parts of environmental disturbances may be more emphasized than the speech features which are relatively smooth. Therefore performing the normalization in the domain of the r-th power may be more helpful. But the value of r should not be too large because in that case the environmental disturbances may be exaggerated and further corrupt the speech features. This approach is computationally simple and efficient. Initial experimental results on AURORA 2.0 testing environment showed that significant improvements in recognition rates are consistently obtainable under all different noisy conditions.
引用
收藏
页码:2538 / 2541
页数:4
相关论文
共 50 条
  • [1] Extended Powered Cepstral Normalization (P-CN) with Range Equalization for Robust Features in Speech Recognition
    Hsu, Chang-wen
    Lee, Lin-shan
    [J]. INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 2816 - 2819
  • [2] Cepstral gain normalization for noise robust speech recognition
    Yoshizawa, Shingo
    Hayasaka, Noboru
    Wada, Naoya
    Miyanaga, Yoshikazu
    [J]. ICASSP IEEE Int Conf Acoust Speech Signal Process Proc, 1600, (I209-I212):
  • [3] Cepstral shape normalization (CSN) for robust speech recognition
    Du, Jun
    Wang, Ren-Hua
    [J]. 2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12, 2008, : 4389 - 4392
  • [4] PARAMETRIC CEPSTRAL MEAN NORMALIZATION FOR ROBUST SPEECH RECOGNITION
    Kalinli, Ozlem
    Bhattacharya, Gautam
    Weng, Chao
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6735 - 6739
  • [5] Cepstral gain normalization for noise robust speech recognition
    Yoshizawa, S
    Hayasaka, N
    Wada, N
    Miyanaga, Y
    [J]. 2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS: SPEECH PROCESSING, 2004, : 209 - 212
  • [6] Cepstral amplitude range normalization for noise robust speech recognition
    Yoshizawa, S
    Hayasaka, N
    Wada, N
    Miyanaga, Y
    [J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2004, E87D (08): : 2130 - 2137
  • [7] A Cepstral PDF Normalization Method for Noise Robust Speech Recognition
    Suk, Yong Ho
    Choi, Seung Ho
    [J]. ADVANCES IN COMPUTER SCIENCE, ENVIRONMENT, ECOINFORMATICS, AND EDUCATION, PT II, 2011, 215 : 34 - +
  • [8] Robust Speech Recognition Combining Cepstral and Articulatory Features
    Zha, Zhuan-ling
    Hu, Jin
    Zhan, Qing-ran
    Shan, Ya-hui
    Xie, Xiang
    Wang, Jing
    Cheng, Hao-bo
    [J]. PROCEEDINGS OF 2017 3RD IEEE INTERNATIONAL CONFERENCE ON COMPUTER AND COMMUNICATIONS (ICCC), 2017, : 1401 - 1405
  • [9] Cepstral vector normalization based on stereo data for robust speech recognition
    Buera, Luis
    Lleida, Eduardo
    Miguel, Antonio
    Ortega, Alfonso
    Saz, Oscar
    [J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2007, 15 (03): : 1098 - 1113
  • [10] Higher Order Cepstral Moment Normalization for Improved Robust Speech Recognition
    Su, Chang-Wen
    Lee, Lin-Shan
    [J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2009, 17 (02): : 205 - 220