Chain-based Discriminative Autoencoders for Speech Recognition

被引：0

作者：

Lee, Hung-Shin ^{[1
]}

Huang, Pin-Tuan ^{[1
]}

Cheng, Yao-Fei ^{[1
]}

Wang, Hsin-Min ^{[1
]}

机构：

[1] Acad Sinica, Inst Informat Sci, New Taipei, Taiwan

来源：

INTERSPEECH 2022 | 2022年

关键词：

discriminative autoencoder; robust speech recognition; multi-condition training;

D O I：

10.21437/Interspeech.2022-10474

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

In our previous work, we proposed a discriminative autoencoder (DcAE) for speech recognition. DcAE combines two training schemes into one. First, since DcAE aims to learn encoder-decoder mappings, the squared error between the reconstructed speech and the input speech is minimized. Second, in the code layer, frame-based phonetic embeddings are obtained by minimizing the categorical cross-entropy between ground truth labels and predicted triphone-state scores. DcAE is developed based on the Kaldi toolkit by treating various TDNN models as encoders. In this paper, we further propose three new versions of DcAE. First, a new objective function that considers both categorical cross-entropy and mutual information between ground truth and predicted triphone-state sequences is used. The resulting DcAE is called a chain-based DcAE (c-DcAE). For application to robust speech recognition, we further extend c-DcAE to hierarchical and parallel structures, resulting in hc-DcAE and pc-DcAE. In these two models, both the error between the reconstructed noisy speech and the input noisy speech and the error between the enhanced speech and the reference clean speech are taken into the objective function. Experimental results on the WSJ and Aurora-4 corpora show that our DcAE models outperform baseline systems.

引用

页码：2078 / 2082

页数：5

共 50 条

[1] Semisupervised Autoencoders for Speech Emotion Recognition
Deng, Jun
Xu, Xinzhou
Zhang, Zixing
Fruehholz, Sascha
Schuller, Bjorn
[J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2018, 26 (01) : 31 - 43
[2] Discriminative auditory-based features for robust speech recognition
Mak, BKW
Tam, YC
Li, PQ
[J]. IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2004, 12 (01): : 27 - 36
[3] Speech Emotion Recognition Based on Robust Discriminative Sparse Regression
Song, Peng
Zheng, Wenming
Yu, Yanwei
Ou, Shifeng
[J]. IEEE TRANSACTIONS ON COGNITIVE AND DEVELOPMENTAL SYSTEMS, 2021, 13 (02) : 343 - 353
[4] Personalization for BERT-based Discriminative Speech Recognition Rescoring
Kolehmainen, Jari
Gu, Yile
Gourav, Aditya
Shivakumar, Prashanth Gurunath
Gandhe, Ankur
Rastrow, Ariya
Bulyko, Ivan
[J]. INTERSPEECH 2023, 2023, : 366 - 370
[5] BAYESIAN DISCRIMINATIVE ADAPTATION FOR SPEECH RECOGNITION
Raut, C. K.
Gales, M. J. F.
[J]. 2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS, 2009, : 4361 - 4364
[6] Discriminative-models for speech recognition
Gales, M. J. F.
[J]. 2007 INFORMATION THEORY AND APPLICATIONS WORKSHOP, 2007, : 168 - 174
[7] Discriminative Training for Automatic Speech Recognition
Heigold, Georg
Ney, Hermann
Schlueter, Ralf
Wiesler, Simon
[J]. IEEE SIGNAL PROCESSING MAGAZINE, 2012, 29 (06) : 58 - 69
[8] Structured Discriminative Models for Speech Recognition
Gales, Mark
Watanabe, Shinji
Fosler-Lussier, Eric
[J]. IEEE SIGNAL PROCESSING MAGAZINE, 2012, 29 (06) : 70 - 81
[9] Structured Discriminative Models for Speech Recognition
Gales, Mark
[J]. 2012 8TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING, 2012, : XXII - XXII
[10] Dynamic visual features based on discriminative speech class projection for visual speech recognition
Lei, X
Cai, XL
Fu, ZH
Zhao, RC
[J]. PROCEEDINGS OF THE 2004 INTERNATIONAL SYMPOSIUM ON INTELLIGENT MULTIMEDIA, VIDEO AND SPEECH PROCESSING, 2004, : 687 - 690

← 1 2 3 4 5 →