Delta-MelSpectra Features for Noise Robustness to DNN-based ASR systems

被引：0

作者：

Kumar, Kshitiz ^{[1
]}

Liu, Chaojun ^{[1
]}

Gong, Yifan ^{[1
]}

机构：

[1] Microsoft Corp, Redmond, WA 98052 USA

来源：

16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5 | 2015年

关键词：

Speech recognition; denoising; delta-features; temporal-difference; DNNs; nonlinearity;

D O I：

暂无

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Deep-neural-networks (DNNs) have significantly improved automatic speech recognition (ASR) accuracy over a range of speech scenarios. However noise-robustness is still a challenge to DNNs, where compared to clean, accuracy degrades significantly for noisy environments. Many of the current DNN-based ASR engines use log-MelSpectra features, along with features from temporal-difference in delta and delta-delta features. In this work we introduce delta-MelSpectra features to seek significant gains for DNNs in noisy environments, where we demonstrate that temporal-difference directly in MelSpectra domain can provide superior noise-robust features. We validate our delta-MelSpectra features over a multistyle trained DNN-ASR system; we tested on a large scale WindowsPhone client data, and obtained 17% and 12% relative reduction in word-error-rate (WER) for noisy and clean environments, respectively.

引用

页码：2445 / 2448

页数：4

共 50 条

[1] Preliminary experiments on the robustness of biologically motivated features for DNN-based ASR
de-la-Calle-Silos, F.
Valverde-Albacete, Francisco J.
Gallardo-Antolin, A.
Pelaez-Moreno, C.
2015 4TH INTERNATIONAL WORK CONFERENCE ON BIOINSPIRED INTELLIGENCE (IWOBI), 2015, : 169 - 175
[2] Exploiting foreign resources for DNN-based ASR
Petr Motlicek
David Imseng
Blaise Potard
Philip N. Garner
Ivan Himawan
EURASIP Journal on Audio, Speech, and Music Processing, 2015
[3] Exploiting foreign resources for DNN-based ASR
Motlicek, Petr
Imseng, David
Potard, Blaise
Garner, Philip N.
Himawan, Ivan
EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2015, : 1 - 10
[4] Performance Analysis of various Front-end and Back End Amalgamations for Noise-robust DNN-based ASR
Dua M.
Sethi P.S.
Agrawal V.
Chawla R.
Recent Advances in Computer Science and Communications, 2021, 14 (09) : 2800 - 2816
[5] ON THE IMPACT OF FREQUENCY RESOLUTION ON FEMALE AND MALE SPEECH IN DNN-BASED NOISE REDUCTION SYSTEMS
Oberhag, Maurice
Zeng, Yan
Martin, Rainer
2024 18TH INTERNATIONAL WORKSHOP ON ACOUSTIC SIGNAL ENHANCEMENT, IWAENC 2024, 2024, : 70 - 74
[6] DNN-BASED EMOTION RECOGNITION BASED ON BOTTLENECK ACOUSTIC FEATURES AND LEXICAL FEATURES
Kim, Eesung
Shin, Jong Won
2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6720 - 6724
[7] TASK SPLITTING FOR DNN-BASED ACOUSTIC ECHO AND NOISE REMOVAL
Braun, Sebastian
Valero, Maria Luis
2022 INTERNATIONAL WORKSHOP ON ACOUSTIC SIGNAL ENHANCEMENT (IWAENC 2022), 2022,
[8] On the Issue of Calibration in DNN-based Speaker Recognition Systems
McLaren, Mitchell
Castan, Diego
Ferrer, Luciana
Lawson, Aaron
17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 1825 - 1829
[9] JOINT NOISE AND MASK AWARE TRAINING FOR DNN-BASED SPEECH ENHANCEMENT WITH SUB-BAND FEATURES
Wang, Qing
Du, Jun
Dai, Li-Rong
Lee, Chin-Hui
2017 HANDS-FREE SPEECH COMMUNICATIONS AND MICROPHONE ARRAYS (HSCMA 2017), 2017, : 101 - 105
[10] Predicting Intelligibility of Enhanced Speech Using Posteriors Derived from DNN-based ASR System
Arai, Kenichi
Araki, Shoko
Ogawa, Atsunori
Kinoshita, Keisuke
Nakatani, Tomohiro
Irino, Toshio
INTERSPEECH 2020, 2020, : 1156 - 1160

← 1 2 3 4 5 →