Joint Optimization of Denoising Autoencoder and DNN Acoustic Model Based on Multi-target Learning for Noisy Speech Recognition

被引：15

作者：

Mimura, Masato ^{[1
]}

Sakai, Shinsuke ^{[1
]}

Kawahara, Tatsuya ^{[1
]}

机构：

[1] Kyoto Univ, Sch Informat, Sakyo Ku, Kyoto 6068501, Japan

来源：

17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES | 2016年

关键词：

Speech Recognition; Speech Enhancement; Deep Neural Network (DNN); Denoising Autoencoder (DAE); DEEP NEURAL-NETWORKS; ADAPTATION;

D O I：

10.21437/Interspeech.2016-388

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Denoising autoencoders (DAEs) have been investigated for enhancing noisy speech before feeding it to the back-end deep neural network (DNN) acoustic model, but there may be a mismatch between the DAE output and the expected input of the back-end DNN, and also inconsistency between the training objective functions of the two networks. In this paper, a joint optimization method of the front-end DAE and the back-end DNN is proposed based on a multi-target learning scheme. In the first step, the front-end DAE is trained with an additional target of minimizing the errors propagated by the back-end DNN. Then, the unified network of DAE and DNN is fine-tuned for the phone state classification target, with an extra target of input speech enhancement imposed to the DAE part. The proposed method has been evaluated with the CHiME3 ASR task, and demonstrated to improve the baseline DNN as well as the simple coupling of DAE with DNN. The method is also effective as a post-filter of a beamformer.

引用

页码：3803 / 3807

页数：5

共 50 条

[41] Multi-Self-Supervised Learning Model-Based Throat Microphone Speech Recognition
Masuda, Kohta
Ogata, Jun
Nishida, Masafumi
Nishimura, Masafumi
[J]. 2023 ASIA PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE, APSIPA ASC, 2023, : 1766 - 1770
[42] E2E-based Multi-task Learning Approach to Joint Speech and Accent Recognition
Zhang, Jicheng
Peng, Yizhou
Pham, Van Tung
Xu, Haihua
Huang, Hao
Chng, Eng Siong
[J]. INTERSPEECH 2021, 2021, : 1519 - 1523
[43] Joint Optimization of Multi-UAV Target Assignment and Path Planning Based on Multi-Agent Reinforcement Learning
Qie, Han
Shi, Dianxi
Shen, Tianlong
Xu, Xinhai
Li, Yuan
Wang, Liujing
[J]. IEEE ACCESS, 2019, 7 : 146264 - 146272
[44] JOINT ACOUSTIC MODELING OF TRIPHONES AND TRIGRAPHEMES BY MULTI-TASK LEARNING DEEP NEURAL NETWORKS FOR LOW-RESOURCE SPEECH RECOGNITION
Chen, Dongpeng
Mak, Brian
Leung, Cheung-Chi
Sivadas, Sunil
[J]. 2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
[45] Optimization of Action Recognition Model Based on Multi-Task Learning and Boundary Gradient
Xu, Yiming
Zhou, Fangjie
Wang, Li
Peng, Wei
Zhang, Kai
[J]. ELECTRONICS, 2021, 10 (19)
[46] A new joint CTC-attention-based speech recognition model with multi-level multi-head attention
Chu-Xiong Qin
Wen-Lin Zhang
Dan Qu
[J]. EURASIP Journal on Audio, Speech, and Music Processing, 2019
[47] Acoustic model training using committee-based active and semi-supervised learning for speech recognition
Tsutaoka, Takuya
Shinoda, Koichi
[J]. 2012 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2012,
[48] A new joint CTC-attention-based speech recognition model with multi-level multi-head attention
Qin, Chu-Xiong
Zhang, Wen-Lin
Qu, Dan
[J]. EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2019, 2019 (01)
[49] JOINT CTC-ATTENTION BASED END-TO-END SPEECH RECOGNITION USING MULTI-TASK LEARNING
Kim, Suyoun
Hori, Takaaki
Watanabe, Shinji
[J]. 2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 4835 - 4839
[50] Discriminative feature learning based on multi-view attention network with diffusion joint loss for speech emotion recognition
Liu, Yang
Chen, Xin
Song, Yuan
Li, Yarong
Wang, Shengbei
Yuan, Weitao
Li, Yongwei
Zhao, Zhen
[J]. ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2024, 137

← 1 2 3 4 5 →