Delta-MelSpectra Features for Noise Robustness to DNN-based ASR systems

被引:0
|
作者
Kumar, Kshitiz [1 ]
Liu, Chaojun [1 ]
Gong, Yifan [1 ]
机构
[1] Microsoft Corp, Redmond, WA 98052 USA
来源
16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5 | 2015年
关键词
Speech recognition; denoising; delta-features; temporal-difference; DNNs; nonlinearity;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Deep-neural-networks (DNNs) have significantly improved automatic speech recognition (ASR) accuracy over a range of speech scenarios. However noise-robustness is still a challenge to DNNs, where compared to clean, accuracy degrades significantly for noisy environments. Many of the current DNN-based ASR engines use log-MelSpectra features, along with features from temporal-difference in delta and delta-delta features. In this work we introduce delta-MelSpectra features to seek significant gains for DNNs in noisy environments, where we demonstrate that temporal-difference directly in MelSpectra domain can provide superior noise-robust features. We validate our delta-MelSpectra features over a multistyle trained DNN-ASR system; we tested on a large scale WindowsPhone client data, and obtained 17% and 12% relative reduction in word-error-rate (WER) for noisy and clean environments, respectively.
引用
收藏
页码:2445 / 2448
页数:4
相关论文
共 50 条
  • [21] DNN-Based Speech Enhancement Using Soft Audible Noise Masking for Wind Noise Reduction
    Haichuan Bai
    Fengpei Ge
    Yonghong Yan
    中国通信, 2018, 15 (09) : 235 - 243
  • [22] Uncertainty decoding with adaptive sampling for noise robust DNN-based acoustic modeling
    Tran, Dung T.
    Delcroix, Marc
    Ogawa, Atsunori
    Nakatani, Tomohiro
    18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 3852 - 3856
  • [23] DNN-Based Score Calibration With Multitask Learning for Noise Robust Speaker Verification
    Tan, Zhili
    Mak, Man-Wai
    Mak, Brian Kan-Wing
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2018, 26 (04) : 700 - 712
  • [24] DNN-Based Speech Enhancement Using Soft Audible Noise Masking for Wind Noise Reduction
    Bai, Haichuan
    Ge, Fengpei
    Yan, Yonghong
    CHINA COMMUNICATIONS, 2018, 15 (09) : 235 - 243
  • [25] SPATIAL DIFFUSENESS FEATURES FOR DNN-BASED SPEECH RECOGNITION IN NOISY AND REVERBERANT ENVIRONMENTS
    Schwarz, Andreas
    Huemmer, Christian
    Maas, Roland
    Kellermann, Walter
    2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 4380 - 4384
  • [26] ASR systems in Noisy Environment: Analysis and Solutions for Increasing Noise Robustness
    Rajnoha, Josef
    Pollak, Petr
    RADIOENGINEERING, 2011, 20 (01) : 74 - 84
  • [27] DNN Uncertainty Propagation Using GMM-Derived Uncertainty Features for Noise Robust ASR
    Nathwani, Karan
    Vincent, Emmanuel
    Illina, Irina
    IEEE SIGNAL PROCESSING LETTERS, 2018, 25 (03) : 338 - 342
  • [28] A DNN-Based Channel Model for Network Planning in Train Control Systems
    Wen, Tao
    Xie, Guo
    Cao, Yuan
    Cai, Baigen
    IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2022, 23 (03) : 2392 - 2399
  • [29] DNN-Based Brain MRI Classification Using Fuzzy Clustering and Autoencoder Features
    Chauhan, Nishant
    Choi, Byung-Jae
    INTERNATIONAL JOURNAL OF FUZZY LOGIC AND INTELLIGENT SYSTEMS, 2021, 21 (04) : 349 - 357
  • [30] Collaborative Inference in DNN-based Satellite Systems with Dynamic Task Streams
    Guan, Jinglong
    Zhang, Qiyang
    Murturi, Ilir
    Donta, Praveen Kumar
    Dustdar, Schahram
    Wang, Shangguang
    ICC 2024 - IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS, 2024, : 3803 - 3808