Chain-based Discriminative Autoencoders for Speech Recognition

被引:0
|
作者
Lee, Hung-Shin [1 ]
Huang, Pin-Tuan [1 ]
Cheng, Yao-Fei [1 ]
Wang, Hsin-Min [1 ]
机构
[1] Acad Sinica, Inst Informat Sci, New Taipei, Taiwan
来源
关键词
discriminative autoencoder; robust speech recognition; multi-condition training;
D O I
10.21437/Interspeech.2022-10474
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In our previous work, we proposed a discriminative autoencoder (DcAE) for speech recognition. DcAE combines two training schemes into one. First, since DcAE aims to learn encoder-decoder mappings, the squared error between the reconstructed speech and the input speech is minimized. Second, in the code layer, frame-based phonetic embeddings are obtained by minimizing the categorical cross-entropy between ground truth labels and predicted triphone-state scores. DcAE is developed based on the Kaldi toolkit by treating various TDNN models as encoders. In this paper, we further propose three new versions of DcAE. First, a new objective function that considers both categorical cross-entropy and mutual information between ground truth and predicted triphone-state sequences is used. The resulting DcAE is called a chain-based DcAE (c-DcAE). For application to robust speech recognition, we further extend c-DcAE to hierarchical and parallel structures, resulting in hc-DcAE and pc-DcAE. In these two models, both the error between the reconstructed noisy speech and the input noisy speech and the error between the enhanced speech and the reference clean speech are taken into the objective function. Experimental results on the WSJ and Aurora-4 corpora show that our DcAE models outperform baseline systems.
引用
收藏
页码:2078 / 2082
页数:5
相关论文
共 50 条
  • [1] Semisupervised Autoencoders for Speech Emotion Recognition
    Deng, Jun
    Xu, Xinzhou
    Zhang, Zixing
    Fruehholz, Sascha
    Schuller, Bjorn
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2018, 26 (01) : 31 - 43
  • [2] Discriminative auditory-based features for robust speech recognition
    Mak, BKW
    Tam, YC
    Li, PQ
    [J]. IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2004, 12 (01): : 27 - 36
  • [3] Speech Emotion Recognition Based on Robust Discriminative Sparse Regression
    Song, Peng
    Zheng, Wenming
    Yu, Yanwei
    Ou, Shifeng
    [J]. IEEE TRANSACTIONS ON COGNITIVE AND DEVELOPMENTAL SYSTEMS, 2021, 13 (02) : 343 - 353
  • [4] Personalization for BERT-based Discriminative Speech Recognition Rescoring
    Kolehmainen, Jari
    Gu, Yile
    Gourav, Aditya
    Shivakumar, Prashanth Gurunath
    Gandhe, Ankur
    Rastrow, Ariya
    Bulyko, Ivan
    [J]. INTERSPEECH 2023, 2023, : 366 - 370
  • [5] BAYESIAN DISCRIMINATIVE ADAPTATION FOR SPEECH RECOGNITION
    Raut, C. K.
    Gales, M. J. F.
    [J]. 2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS, 2009, : 4361 - 4364
  • [6] Discriminative-models for speech recognition
    Gales, M. J. F.
    [J]. 2007 INFORMATION THEORY AND APPLICATIONS WORKSHOP, 2007, : 168 - 174
  • [7] Discriminative Training for Automatic Speech Recognition
    Heigold, Georg
    Ney, Hermann
    Schlueter, Ralf
    Wiesler, Simon
    [J]. IEEE SIGNAL PROCESSING MAGAZINE, 2012, 29 (06) : 58 - 69
  • [8] Structured Discriminative Models for Speech Recognition
    Gales, Mark
    Watanabe, Shinji
    Fosler-Lussier, Eric
    [J]. IEEE SIGNAL PROCESSING MAGAZINE, 2012, 29 (06) : 70 - 81
  • [9] Structured Discriminative Models for Speech Recognition
    Gales, Mark
    [J]. 2012 8TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING, 2012, : XXII - XXII
  • [10] Dynamic visual features based on discriminative speech class projection for visual speech recognition
    Lei, X
    Cai, XL
    Fu, ZH
    Zhao, RC
    [J]. PROCEEDINGS OF THE 2004 INTERNATIONAL SYMPOSIUM ON INTELLIGENT MULTIMEDIA, VIDEO AND SPEECH PROCESSING, 2004, : 687 - 690