DNN Uncertainty Propagation Using GMM-Derived Uncertainty Features for Noise Robust ASR

被引：7

作者：

Nathwani, Karan ^{[1
]}

Vincent, Emmanuel ^{[2
]}

Illina, Irina ^{[3
]}

机构：

[1] Indian Inst Technol, Jammu 181121, India

[2] Inria, F-54600 Villers Les Nancy, France

[3] Univ Lorraine, LORIA, UMR 7503, F-54506 Vandoeuvre Les Nancy, France

来源：

IEEE SIGNAL PROCESSING LETTERS | 2018年 / 25卷 / 03期

关键词：

DNN acoustic model; GMM-derived uncertainty features; robust ASR; uncertainty decoding; DEEP NEURAL-NETWORK; FEATURE ENHANCEMENT; ACOUSTIC MODELS; SPEECH; RECOGNITION; COMPENSATION; ADAPTATION; SEPARATION; FRAMEWORK;

D O I：

10.1109/LSP.2018.2791534

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

The uncertainty decoding framework is known to improve the deep neural network (DNN)-based automatic speech recognition (ASR) performance in noisy environments. It operates by estimating the statistical uncertainty about the input features and propagating it to the output senone posteriors by sampling. Unfortunately, this approximate propagation scheme limits the performance improvement. In this letter, we exploit the fact that uncertainty propagation can be achieved in closed form for Gaussian mixture acoustic models (GMMs). We introduce new GMM-derived (GMMD) uncertainty features for the robust DNN-based acoustic model training and decoding. The GMMD features are computed as the difference between the GMM log-likelihoods obtained with versus without uncertainty. They are concatenated with conventional acoustic features and used as inputs to the DNN. We evaluate the resulting ASR performance on the CHiME-2 and CHiME-3 datasets. The proposed features are shown to improve the performance on both datasets, both for the conventional decoding and for the uncertainty decoding with different uncertainty estimation/propagation techniques.

引用

页码：338 / 342

页数：5

共 50 条

[1] AN EXTENDED EXPERIMENTAL INVESTIGATION OF DNN UNCERTAINTY PROPAGATION FOR NOISE ROBUST ASR
Nathwani, Karan
Morales-Cordovilla, Juan A.
Sivasankaran, Sunit
Illina, Irina
Vincent, Emmanuel
[J]. 2017 HANDS-FREE SPEECH COMMUNICATIONS AND MICROPHONE ARRAYS (HSCMA 2017), 2017, : 26 - 30
[2] Nonparametric Uncertainty Estimation and Propagation for Noise Robust ASR
Tran, Dung T.
Vincent, Emmanuel
Jouvet, Denis
[J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2015, 23 (11) : 1835 - 1846
[3] EXTENSION OF UNCERTAINTY PROPAGATION TO DYNAMIC MFCCS FOR NOISE ROBUST ASR
Tran, Dung T.
Vincent, Emmanuel
Jouvet, Denis
[J]. 2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
[4] CONSISTENT DNN UNCERTAINTY TRAINING AND DECODING FOR ROBUST ASR
Nathwani, Karan
Vincent, Emmanuel
Illina, Irina
[J]. 2017 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2017, : 185 - 192
[5] DISCRIMINATIVE UNCERTAINTY ESTIMATION FOR NOISE ROBUST ASR
Tran, Dung T.
Vincent, Emmanuel
Jouvet, Denis
[J]. 2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 5038 - 5042
[6] FUSION OF MULTIPLE UNCERTAINTY ESTIMATORS AND PROPAGATORS FOR NOISE ROBUST ASR
Tran, Dung T.
Vincent, Emmanuel
Jouvet, Denis
[J]. 2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
[7] An Uncertainty Propagation Approach to Robust ASR Using the ETSI Advanced Front-End
Astudillo, Ramon Fernandez
Kolossa, Dorothea
Mandelartz, Philipp
Orglmeister, Reinhold
[J]. IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2010, 4 (05) : 824 - 833
[8] Model-based feature enhancement with uncertainty decoding for noise robust ASR
Stouten, Veronique
Van hamme, Hugo
Warnbacq, Patrick
[J]. SPEECH COMMUNICATION, 2006, 48 (11) : 1502 - 1514
[9] EARLY FUSION OF SPARSE CLASSIFICATION AND GMM FOR NOISE ROBUST ASR
Sun, Yang
Gemmeke, Jort F.
Cranen, Bert
ten Bosch, Louis
Boves, Lou
[J]. 19TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO-2011), 2011, : 1495 - 1499
[10] Uncertainty decoding with adaptive sampling for noise robust DNN-based acoustic modeling
Tran, Dung T.
Delcroix, Marc
Ogawa, Atsunori
Nakatani, Tomohiro
[J]. 18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 3852 - 3856

← 1 2 3 4 5 →