Delta-MelSpectra Features for Noise Robustness to DNN-based ASR systems

被引:0
|
作者
Kumar, Kshitiz [1 ]
Liu, Chaojun [1 ]
Gong, Yifan [1 ]
机构
[1] Microsoft Corp, Redmond, WA 98052 USA
来源
16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5 | 2015年
关键词
Speech recognition; denoising; delta-features; temporal-difference; DNNs; nonlinearity;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Deep-neural-networks (DNNs) have significantly improved automatic speech recognition (ASR) accuracy over a range of speech scenarios. However noise-robustness is still a challenge to DNNs, where compared to clean, accuracy degrades significantly for noisy environments. Many of the current DNN-based ASR engines use log-MelSpectra features, along with features from temporal-difference in delta and delta-delta features. In this work we introduce delta-MelSpectra features to seek significant gains for DNNs in noisy environments, where we demonstrate that temporal-difference directly in MelSpectra domain can provide superior noise-robust features. We validate our delta-MelSpectra features over a multistyle trained DNN-ASR system; we tested on a large scale WindowsPhone client data, and obtained 17% and 12% relative reduction in word-error-rate (WER) for noisy and clean environments, respectively.
引用
收藏
页码:2445 / 2448
页数:4
相关论文
共 50 条
  • [31] DNN-BASED MASK ESTIMATION INTEGRATING SPECTRAL AND SPATIAL FEATURES FOR ROBUST BEAMFORMING
    Deng, Chengyun
    Song, Hui
    Zhang, Yi
    Sha, Yongtao
    Li, Xiangang
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 4647 - 4651
  • [32] DNN-Based Cross-Lingual Voice Conversion Using Bottleneck Features
    M. Kiran Reddy
    K. Sreenivasa Rao
    Neural Processing Letters, 2020, 51 : 2029 - 2042
  • [33] DNN-Based Cross-Lingual Voice Conversion Using Bottleneck Features
    Reddy, M. Kiran
    Rao, K. Sreenivasa
    NEURAL PROCESSING LETTERS, 2020, 51 (02) : 2029 - 2042
  • [34] Uncertainty Characterization in Active Sensor Systems with DNN-based Feedback Control
    Mudassar, Burhan A.
    Saha, Priyabrata
    Mukhopadhyay, Saibal
    2020 IEEE SENSORS, 2020,
  • [35] Robust Adversarial Attacks Against DNN-Based Wireless Communication Systems
    Bahramali, Alireza
    Nasr, Milad
    Houmansadr, Amir
    Goeckel, Dennis
    Towsley, Don
    CCS '21: PROCEEDINGS OF THE 2021 ACM SIGSAC CONFERENCE ON COMPUTER AND COMMUNICATIONS SECURITY, 2021, : 126 - 140
  • [36] ANALYSIS OF THE DNN-BASED SRE SYSTEMS IN MULTI-LANGUAGE CONDITIONS
    Novotny, Ondrej
    Matejka, Pavel
    Glembek, Ondrej
    Plchot, Oldrich
    Grezl, Frantisek
    Burget, Lukas
    Cernocky, Jan
    2016 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2016), 2016, : 199 - 204
  • [37] INTEGRATED DNN-BASED MODEL ADAPTATION TECHNIQUE FOR NOISE-ROBUST SPEECH RECOGNITION
    Lee, Kang Hyun
    Kang, Woo Hyun
    Kang, Tae Gyoon
    Kim, Nam Soo
    2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 5245 - 5249
  • [38] SNR-Based Features and Diverse Training Data for Robust DNN-Based Speech Enhancement
    Rehr, Robert
    Gerkmann, Timo
    IEEE/ACM Transactions on Audio Speech and Language Processing, 2021, 29 : 1937 - 1949
  • [39] SNR-Based Features and Diverse Training Data for Robust DNN-Based Speech Enhancement
    Rehr, Robert
    Gerkmann, Timo
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2021, 29 : 1937 - 1949
  • [40] Speaker verification using short utterances with DNN-based estimation of subglottal acoustic features
    Guo, Jinxi
    Yeung, Gary
    Muralidharan, Deepak
    Arsikere, Harish
    Afshan, Amber
    Alwan, Abeer
    17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 2219 - 2222