Delta-MelSpectra Features for Noise Robustness to DNN-based ASR systems

被引:0
|
作者
Kumar, Kshitiz [1 ]
Liu, Chaojun [1 ]
Gong, Yifan [1 ]
机构
[1] Microsoft Corp, Redmond, WA 98052 USA
来源
16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5 | 2015年
关键词
Speech recognition; denoising; delta-features; temporal-difference; DNNs; nonlinearity;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Deep-neural-networks (DNNs) have significantly improved automatic speech recognition (ASR) accuracy over a range of speech scenarios. However noise-robustness is still a challenge to DNNs, where compared to clean, accuracy degrades significantly for noisy environments. Many of the current DNN-based ASR engines use log-MelSpectra features, along with features from temporal-difference in delta and delta-delta features. In this work we introduce delta-MelSpectra features to seek significant gains for DNNs in noisy environments, where we demonstrate that temporal-difference directly in MelSpectra domain can provide superior noise-robust features. We validate our delta-MelSpectra features over a multistyle trained DNN-ASR system; we tested on a large scale WindowsPhone client data, and obtained 17% and 12% relative reduction in word-error-rate (WER) for noisy and clean environments, respectively.
引用
收藏
页码:2445 / 2448
页数:4
相关论文
共 50 条
  • [1] Preliminary experiments on the robustness of biologically motivated features for DNN-based ASR
    de-la-Calle-Silos, F.
    Valverde-Albacete, Francisco J.
    Gallardo-Antolin, A.
    Pelaez-Moreno, C.
    2015 4TH INTERNATIONAL WORK CONFERENCE ON BIOINSPIRED INTELLIGENCE (IWOBI), 2015, : 169 - 175
  • [2] Exploiting foreign resources for DNN-based ASR
    Petr Motlicek
    David Imseng
    Blaise Potard
    Philip N. Garner
    Ivan Himawan
    EURASIP Journal on Audio, Speech, and Music Processing, 2015
  • [3] Exploiting foreign resources for DNN-based ASR
    Motlicek, Petr
    Imseng, David
    Potard, Blaise
    Garner, Philip N.
    Himawan, Ivan
    EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2015, : 1 - 10
  • [4] Performance Analysis of various Front-end and Back End Amalgamations for Noise-robust DNN-based ASR
    Dua M.
    Sethi P.S.
    Agrawal V.
    Chawla R.
    Recent Advances in Computer Science and Communications, 2021, 14 (09) : 2800 - 2816
  • [5] ON THE IMPACT OF FREQUENCY RESOLUTION ON FEMALE AND MALE SPEECH IN DNN-BASED NOISE REDUCTION SYSTEMS
    Oberhag, Maurice
    Zeng, Yan
    Martin, Rainer
    2024 18TH INTERNATIONAL WORKSHOP ON ACOUSTIC SIGNAL ENHANCEMENT, IWAENC 2024, 2024, : 70 - 74
  • [6] DNN-BASED EMOTION RECOGNITION BASED ON BOTTLENECK ACOUSTIC FEATURES AND LEXICAL FEATURES
    Kim, Eesung
    Shin, Jong Won
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6720 - 6724
  • [7] TASK SPLITTING FOR DNN-BASED ACOUSTIC ECHO AND NOISE REMOVAL
    Braun, Sebastian
    Valero, Maria Luis
    2022 INTERNATIONAL WORKSHOP ON ACOUSTIC SIGNAL ENHANCEMENT (IWAENC 2022), 2022,
  • [8] On the Issue of Calibration in DNN-based Speaker Recognition Systems
    McLaren, Mitchell
    Castan, Diego
    Ferrer, Luciana
    Lawson, Aaron
    17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 1825 - 1829
  • [9] JOINT NOISE AND MASK AWARE TRAINING FOR DNN-BASED SPEECH ENHANCEMENT WITH SUB-BAND FEATURES
    Wang, Qing
    Du, Jun
    Dai, Li-Rong
    Lee, Chin-Hui
    2017 HANDS-FREE SPEECH COMMUNICATIONS AND MICROPHONE ARRAYS (HSCMA 2017), 2017, : 101 - 105
  • [10] Predicting Intelligibility of Enhanced Speech Using Posteriors Derived from DNN-based ASR System
    Arai, Kenichi
    Araki, Shoko
    Ogawa, Atsunori
    Kinoshita, Keisuke
    Nakatani, Tomohiro
    Irino, Toshio
    INTERSPEECH 2020, 2020, : 1156 - 1160