On multi-domain training and adaptation of end-to-end RNN acoustic models for distant speech recognition

被引:13
|
作者
Mirsamadi, Seyedmandad [1 ]
Hansen, John H. L. [1 ]
机构
[1] Univ Texas Dallas, CRSS, Richardson, TX 75080 USA
关键词
distant speech recognition; recurrent neural network; multi-domain training;
D O I
10.21437/Interspeech.2017-398
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recognition of distant (far-field) speech is a challenge for ASR due to mismatch in recording conditions resulting from room reverberation and environment noise. Given the remarkable learning capacity of deep neural networks, there is increasing interest to address this problem by using a large corpus of reverberant far-field speech to train robust models. In this study. we explore how an end-to-end RNN acoustic model trained on speech from different rooms and acoustic conditions (different domains) achieves robustness to environmental variations. It is shown that the first hidden layer acts as a domain separator, projecting the data from different domains into different sub-spaces. The subsequent layers then use this encoded domain knowledge to map these features to final representations that are invariant to domain change. This mechanism is closely related to noise-aware or room-aware approaches which append manually-extracted domain signatures to the input features. Additionaly, we demonstrate how this understanding of the learning procedure provides useful guidance for model adaptation to new acoustic conditions. We present results based on AMI corpus to demonstrate the propagation of domain information in a deep RNN, and perform recognition experiments which indicate the role of encoded domain knowledge on training and adaptation of RNN acoustic models.
引用
下载
收藏
页码:404 / 408
页数:5
相关论文
共 50 条
  • [21] End-to-End Neural Segmental Models for Speech Recognition
    Tang, Hao
    Lu, Liang
    Kong, Lingpeng
    Gimpel, Kevin
    Livescu, Karen
    Dyer, Chris
    Smith, Noah A.
    Renals, Steve
    IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2017, 11 (08) : 1254 - 1264
  • [22] Combination of end-to-end and hybrid models for speech recognition
    Wong, Jeremy H. M.
    Gaur, Yashesh
    Zhao, Rui
    Lu, Liang
    Sun, Eric
    Li, Jinyu
    Gong, Yifan
    INTERSPEECH 2020, 2020, : 1783 - 1787
  • [23] AN INVESTIGATION OF END-TO-END MODELS FOR ROBUST SPEECH RECOGNITION
    Prasad, Archiki
    Jyothi, Preethi
    Velmurugan, Rajbabu
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6893 - 6897
  • [24] Achieving End-to-End Connectivity in Global Multi-Domain Networks
    Municio, Esteban
    Cevik, Mert
    Ruth, Paul
    Marquez-Barja, Johann M.
    IEEE CONFERENCE ON COMPUTER COMMUNICATIONS WORKSHOPS (IEEE INFOCOM WKSHPS 2021), 2021,
  • [25] End-to-end emotional speech recognition using acoustic model adaptation based on knowledge distillation
    Yun, Hong-In
    Park, Jeong-Sik
    MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 82 (15) : 22759 - 22776
  • [26] End-to-end emotional speech recognition using acoustic model adaptation based on knowledge distillation
    Hong-In Yun
    Jeong-Sik Park
    Multimedia Tools and Applications, 2023, 82 : 22759 - 22776
  • [27] Provision of end-to-end QoS in heterogeneous multi-domain networks
    Burakowski, Wojciech
    Beben, Andrzej
    Tarasiuk, Halina
    Sliwinski, Jaroslaw
    Janowski, Robert
    Batalla, Jordi Mongay
    Krawiec, Piotr
    ANNALS OF TELECOMMUNICATIONS, 2008, 63 (11-12) : 559 - 577
  • [28] ConvLab: Multi-Domain End-to-End Dialog System Platform
    Lee, Sungjin
    Zhu, Qi
    Takanobu, Ryuichi
    Zhang, Zheng
    Zhang, Yaoqin
    Li, Xiang
    Li, Jinchao
    Peng, Baolin
    Li, Xiujun
    Huang, Minlie
    Gao, Jianfeng
    PROCEEDINGS OF THE 57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: SYSTEM DEMONSTRATIONS, (ACL 2019), 2019, : 64 - 69
  • [29] END-TO-END MULTI-SPEAKER SPEECH RECOGNITION
    Settle, Shane
    Le Roux, Jonathan
    Hori, Takaaki
    Watanabe, Shinji
    Hershey, John R.
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 4819 - 4823
  • [30] Multi-Stream End-to-End Speech Recognition
    Li, Ruizhi
    Wang, Xiaofei
    Mallidi, Sri Harish
    Watanabe, Shinji
    Hori, Takaaki
    Hermansky, Hynek
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2020, 28 (646-655) : 646 - 655