Static and dynamic spectral features: Their noise robustness and optimal weights for ASR

被引:7
|
作者
Yang, Chen
Soong, Frank K.
Lee, Tan
机构
[1] Chinese Univ Hong Kong, Dept Elect Engn, Shatin, Hong Kong, Peoples R China
[2] Microsoft Res Asia, Beijing 100080, Peoples R China
关键词
discriminative training; dynamic features; exponential weighting; noise robustness;
D O I
10.1109/TASL.2006.885932
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In this paper, we investigate the relative noise robustness of dynamic and static spectral features in speech recognition. It is found that the dynamic cepstrum is more robust to additive noise than its static counterpart. The results are consistent across different types of noise and over a wide range of noise levels. To exploit this unequal robustness, we propose a simple yet effective strategy of exponentially weighting the likelihoods that are contributed by the static and dynamic features during the decoding process. The optimal weights are discriminatively trained with a small amount of development data. This method is evaluated on two speaker-independent, connected digit databases, one in English (Aurora 2) and the other in Cantonese (CUDIGIT). For various types of noise at different signal-to-noise ratios (SNRs), the average relative word error rate reductions attained with the discriminatively trained weights are 36.6% and 41.9 % for Aurora 2 and CUDIGIT, respectively. Noticeable performance improvement can be observed even when there is channel distortion. The proposed approach is appealing to practical applications because. 1) noise estimation is not required, 2) model adaptation is not required, 3) only minor modification of the decoding process is needed, and 4) only few feature weights need to be trained.
引用
收藏
页码:1087 / 1097
页数:11
相关论文
共 50 条
  • [1] Static and dynamic spectral features: Their noise robustness and optimal weights for ASR
    Chen, Y
    Soong, FK
    Lee, T
    [J]. 2005 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1-5: SPEECH PROCESSING, 2005, : 241 - 244
  • [2] On noise robustness of dynamic and static features for continuous Cantonese digit recognition
    Yang, C
    Soong, FK
    Lee, T
    [J]. 2004 International Symposium on Chinese Spoken Language Processing, Proceedings, 2004, : 277 - 280
  • [3] Delta-MelSpectra Features for Noise Robustness to DNN-based ASR systems
    Kumar, Kshitiz
    Liu, Chaojun
    Gong, Yifan
    [J]. 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 2445 - 2448
  • [4] SPECTRAL SMOOTHING BY VARIATIONAL MODE DECOMPOSITION AND ITS EFFECT ON NOISE AND PITCH ROBUSTNESS OF ASR SYSTEM
    Yadav, Ishwar Chandra
    Shahnawazuddin, S.
    Govind, D.
    Pradhan, Gayadhar
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5629 - 5633
  • [5] SPEAKER NORMALIZATION OF STATIC AND DYNAMIC VOWEL SPECTRAL FEATURES
    ZAHORIAN, SA
    JAGHARGHI, AJ
    [J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1991, 90 (01): : 67 - 75
  • [6] Spectral quantization using statistics of static and dynamic features
    Koishida, K
    Tokuda, K
    Masuko, T
    Kobayashi, T
    [J]. 1997 IEEE WORKSHOP ON SPEECH CODING FOR TELECOMMUNICATIONS, PROCEEDINGS: BACK TO BASICS: ATTACKING FUNDAMENTAL PROBLEMS IN SPEECH CODING, 1997, : 19 - 20
  • [7] ENHANCING NOISE AND PITCH ROBUSTNESS OF CHILDREN'S ASR
    Shahnawazuddin, S.
    Deepak, K. T.
    Pradhan, Gayadhar
    Sinha, Rohit
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 5225 - 5229
  • [8] Noise robustness of texture features
    Tan, TN
    [J]. IMAGE AND VISION COMPUTING, 1997, 15 (11) : 815 - 817
  • [9] Optimal Unification of Static and Dynamic Features for Smartphone Security Analysis
    Kumar, Sumit
    Indu, S.
    Walia, Gurjit Singh
    [J]. INTELLIGENT AUTOMATION AND SOFT COMPUTING, 2023, 35 (01): : 1035 - 1051
  • [10] Spectral envelope quantization with noise robustness
    Kim, MY
    Kim, HK
    Cho, YD
    Kim, SR
    [J]. 1997 IEEE WORKSHOP ON SPEECH CODING FOR TELECOMMUNICATIONS, PROCEEDINGS: BACK TO BASICS: ATTACKING FUNDAMENTAL PROBLEMS IN SPEECH CODING, 1997, : 77 - 78