On multi-domain training and adaptation of end-to-end RNN acoustic models for distant speech recognition

被引：13

作者：

Mirsamadi, Seyedmandad ^{[1
]}

Hansen, John H. L. ^{[1
]}

机构：

[1] Univ Texas Dallas, CRSS, Richardson, TX 75080 USA

来源：

18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION | 2017年

关键词：

distant speech recognition; recurrent neural network; multi-domain training;

D O I：

10.21437/Interspeech.2017-398

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Recognition of distant (far-field) speech is a challenge for ASR due to mismatch in recording conditions resulting from room reverberation and environment noise. Given the remarkable learning capacity of deep neural networks, there is increasing interest to address this problem by using a large corpus of reverberant far-field speech to train robust models. In this study. we explore how an end-to-end RNN acoustic model trained on speech from different rooms and acoustic conditions (different domains) achieves robustness to environmental variations. It is shown that the first hidden layer acts as a domain separator, projecting the data from different domains into different sub-spaces. The subsequent layers then use this encoded domain knowledge to map these features to final representations that are invariant to domain change. This mechanism is closely related to noise-aware or room-aware approaches which append manually-extracted domain signatures to the input features. Additionaly, we demonstrate how this understanding of the learning procedure provides useful guidance for model adaptation to new acoustic conditions. We present results based on AMI corpus to demonstrate the propagation of domain information in a deep RNN, and perform recognition experiments which indicate the role of encoded domain knowledge on training and adaptation of RNN acoustic models.

引用

页码：404 / 408

页数：5

共 50 条

[21] Accented Speech Recognition Based on End-to-End Domain Adversarial Training of Neural Networks
Na, Hyeong-Ju
Park, Jeong-Sik
APPLIED SCIENCES-BASEL, 2021, 11 (18):
[22] INTERNAL LANGUAGE MODEL TRAINING FOR DOMAIN-ADAPTIVE END-TO-END SPEECH RECOGNITION
Meng, Zhong
Kanda, Naoyuki
Gaur, Yashesh
Parthasarathy, Sarangarajan
Sun, Eric
Lu, Liang
Chen, Xie
Li, Jinyu
Gong, Yifan
2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 7338 - 7342
[23] End-to-End Neural Segmental Models for Speech Recognition
Tang, Hao
Lu, Liang
Kong, Lingpeng
Gimpel, Kevin
Livescu, Karen
Dyer, Chris
Smith, Noah A.
Renals, Steve
IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2017, 11 (08) : 1254 - 1264
[24] Combination of end-to-end and hybrid models for speech recognition
Wong, Jeremy H. M.
Gaur, Yashesh
Zhao, Rui
Lu, Liang
Sun, Eric
Li, Jinyu
Gong, Yifan
INTERSPEECH 2020, 2020, : 1783 - 1787
[25] AN INVESTIGATION OF END-TO-END MODELS FOR ROBUST SPEECH RECOGNITION
Prasad, Archiki
Jyothi, Preethi
Velmurugan, Rajbabu
2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6893 - 6897
[26] End-to-end emotional speech recognition using acoustic model adaptation based on knowledge distillation
Yun, Hong-In
Park, Jeong-Sik
MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 82 (15) : 22759 - 22776
[27] End-to-end emotional speech recognition using acoustic model adaptation based on knowledge distillation
Hong-In Yun
Jeong-Sik Park
Multimedia Tools and Applications, 2023, 82 : 22759 - 22776
[28] Achieving End-to-End Connectivity in Global Multi-Domain Networks
Municio, Esteban
Cevik, Mert
Ruth, Paul
Marquez-Barja, Johann M.
IEEE CONFERENCE ON COMPUTER COMMUNICATIONS WORKSHOPS (IEEE INFOCOM WKSHPS 2021), 2021,
[29] Provision of end-to-end QoS in heterogeneous multi-domain networks
Burakowski, Wojciech
Beben, Andrzej
Tarasiuk, Halina
Sliwinski, Jaroslaw
Janowski, Robert
Batalla, Jordi Mongay
Krawiec, Piotr
ANNALS OF TELECOMMUNICATIONS, 2008, 63 (11-12) : 559 - 577
[30] ConvLab: Multi-Domain End-to-End Dialog System Platform
Lee, Sungjin
Zhu, Qi
Takanobu, Ryuichi
Zhang, Zheng
Zhang, Yaoqin
Li, Xiang
Li, Jinchao
Peng, Baolin
Li, Xiujun
Huang, Minlie
Gao, Jianfeng
PROCEEDINGS OF THE 57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: SYSTEM DEMONSTRATIONS, (ACL 2019), 2019, : 64 - 69

← 1 2 3 4 5 →