DNN Uncertainty Propagation Using GMM-Derived Uncertainty Features for Noise Robust ASR

被引:7
|
作者
Nathwani, Karan [1 ]
Vincent, Emmanuel [2 ]
Illina, Irina [3 ]
机构
[1] Indian Inst Technol, Jammu 181121, India
[2] Inria, F-54600 Villers Les Nancy, France
[3] Univ Lorraine, LORIA, UMR 7503, F-54506 Vandoeuvre Les Nancy, France
关键词
DNN acoustic model; GMM-derived uncertainty features; robust ASR; uncertainty decoding; DEEP NEURAL-NETWORK; FEATURE ENHANCEMENT; ACOUSTIC MODELS; SPEECH; RECOGNITION; COMPENSATION; ADAPTATION; SEPARATION; FRAMEWORK;
D O I
10.1109/LSP.2018.2791534
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
The uncertainty decoding framework is known to improve the deep neural network (DNN)-based automatic speech recognition (ASR) performance in noisy environments. It operates by estimating the statistical uncertainty about the input features and propagating it to the output senone posteriors by sampling. Unfortunately, this approximate propagation scheme limits the performance improvement. In this letter, we exploit the fact that uncertainty propagation can be achieved in closed form for Gaussian mixture acoustic models (GMMs). We introduce new GMM-derived (GMMD) uncertainty features for the robust DNN-based acoustic model training and decoding. The GMMD features are computed as the difference between the GMM log-likelihoods obtained with versus without uncertainty. They are concatenated with conventional acoustic features and used as inputs to the DNN. We evaluate the resulting ASR performance on the CHiME-2 and CHiME-3 datasets. The proposed features are shown to improve the performance on both datasets, both for the conventional decoding and for the uncertainty decoding with different uncertainty estimation/propagation techniques.
引用
收藏
页码:338 / 342
页数:5
相关论文
共 50 条
  • [1] AN EXTENDED EXPERIMENTAL INVESTIGATION OF DNN UNCERTAINTY PROPAGATION FOR NOISE ROBUST ASR
    Nathwani, Karan
    Morales-Cordovilla, Juan A.
    Sivasankaran, Sunit
    Illina, Irina
    Vincent, Emmanuel
    [J]. 2017 HANDS-FREE SPEECH COMMUNICATIONS AND MICROPHONE ARRAYS (HSCMA 2017), 2017, : 26 - 30
  • [2] Nonparametric Uncertainty Estimation and Propagation for Noise Robust ASR
    Tran, Dung T.
    Vincent, Emmanuel
    Jouvet, Denis
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2015, 23 (11) : 1835 - 1846
  • [3] EXTENSION OF UNCERTAINTY PROPAGATION TO DYNAMIC MFCCS FOR NOISE ROBUST ASR
    Tran, Dung T.
    Vincent, Emmanuel
    Jouvet, Denis
    [J]. 2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
  • [4] CONSISTENT DNN UNCERTAINTY TRAINING AND DECODING FOR ROBUST ASR
    Nathwani, Karan
    Vincent, Emmanuel
    Illina, Irina
    [J]. 2017 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2017, : 185 - 192
  • [5] DISCRIMINATIVE UNCERTAINTY ESTIMATION FOR NOISE ROBUST ASR
    Tran, Dung T.
    Vincent, Emmanuel
    Jouvet, Denis
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 5038 - 5042
  • [6] FUSION OF MULTIPLE UNCERTAINTY ESTIMATORS AND PROPAGATORS FOR NOISE ROBUST ASR
    Tran, Dung T.
    Vincent, Emmanuel
    Jouvet, Denis
    [J]. 2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
  • [7] An Uncertainty Propagation Approach to Robust ASR Using the ETSI Advanced Front-End
    Astudillo, Ramon Fernandez
    Kolossa, Dorothea
    Mandelartz, Philipp
    Orglmeister, Reinhold
    [J]. IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2010, 4 (05) : 824 - 833
  • [8] Model-based feature enhancement with uncertainty decoding for noise robust ASR
    Stouten, Veronique
    Van hamme, Hugo
    Warnbacq, Patrick
    [J]. SPEECH COMMUNICATION, 2006, 48 (11) : 1502 - 1514
  • [9] EARLY FUSION OF SPARSE CLASSIFICATION AND GMM FOR NOISE ROBUST ASR
    Sun, Yang
    Gemmeke, Jort F.
    Cranen, Bert
    ten Bosch, Louis
    Boves, Lou
    [J]. 19TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO-2011), 2011, : 1495 - 1499
  • [10] Uncertainty decoding with adaptive sampling for noise robust DNN-based acoustic modeling
    Tran, Dung T.
    Delcroix, Marc
    Ogawa, Atsunori
    Nakatani, Tomohiro
    [J]. 18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 3852 - 3856