On multi-domain training and adaptation of end-to-end RNN acoustic models for distant speech recognition

被引:13
|
作者
Mirsamadi, Seyedmandad [1 ]
Hansen, John H. L. [1 ]
机构
[1] Univ Texas Dallas, CRSS, Richardson, TX 75080 USA
关键词
distant speech recognition; recurrent neural network; multi-domain training;
D O I
10.21437/Interspeech.2017-398
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recognition of distant (far-field) speech is a challenge for ASR due to mismatch in recording conditions resulting from room reverberation and environment noise. Given the remarkable learning capacity of deep neural networks, there is increasing interest to address this problem by using a large corpus of reverberant far-field speech to train robust models. In this study. we explore how an end-to-end RNN acoustic model trained on speech from different rooms and acoustic conditions (different domains) achieves robustness to environmental variations. It is shown that the first hidden layer acts as a domain separator, projecting the data from different domains into different sub-spaces. The subsequent layers then use this encoded domain knowledge to map these features to final representations that are invariant to domain change. This mechanism is closely related to noise-aware or room-aware approaches which append manually-extracted domain signatures to the input features. Additionaly, we demonstrate how this understanding of the learning procedure provides useful guidance for model adaptation to new acoustic conditions. We present results based on AMI corpus to demonstrate the propagation of domain information in a deep RNN, and perform recognition experiments which indicate the role of encoded domain knowledge on training and adaptation of RNN acoustic models.
引用
下载
收藏
页码:404 / 408
页数:5
相关论文
共 50 条
  • [41] Streaming End-to-End Speech Recognition for Hybrid RNN-T/Attention Architecture
    Moriya, Takafumi
    Tanaka, Tomohiro
    Ashihara, Takanori
    Ochiai, Tsubasa
    Sato, Hiroshi
    Ando, Atsushi
    Masumura, Ryo
    Delcroix, Marc
    Asami, Taichi
    INTERSPEECH 2021, 2021, : 1787 - 1791
  • [42] Knowledge Distillation from Offline to Streaming RNN Transducer for End-to-end Speech Recognition
    Kurata, Gakuto
    Saon, George
    INTERSPEECH 2020, 2020, : 2117 - 2121
  • [43] Multi-domain Knowledge Distillation via Uncertainty-Matching for End-to-End ASR Models
    Kim, Ho-Gyeong
    Lee, Min-Joong
    Lee, Hoshik
    Kang, Tae Gyoon
    Lee, Jihyun
    Yang, Eunho
    Hwang, Sung Ju
    INTERSPEECH 2021, 2021, : 2531 - 2535
  • [44] Domain Expansion for End-to-End Speech Recognition: Applications for Accent/Dialect Speech
    Ghorbani, Shahram
    Hansen, John H. L.
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2023, 31 : 762 - 774
  • [45] IMPROVING HYBRID CTC/ATTENTION END-TO-END SPEECH RECOGNITION WITH PRETRAINED ACOUSTIC AND LANGUAGE MODELS
    Deng, Keqi
    Cao, Songjun
    Zhang, Yike
    Ma, Long
    2021 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2021, : 76 - 82
  • [46] Layer-wise Fast Adaptation for End-to-End Multi-Accent Speech Recognition
    Gong, Xun
    Lu, Yizhou
    Zhou, Zhikai
    Qian, Yanmin
    INTERSPEECH 2021, 2021, : 1274 - 1278
  • [47] Improving End-to-End Models for Children's Speech Recognition
    Patel, Tanvina
    Scharenborg, Odette
    APPLIED SCIENCES-BASEL, 2024, 14 (06):
  • [48] Incorporating End-to-End Speech Recognition Models for Sentiment Analysis
    Lakomkin, Egor
    Zamani, Mohammad Ali
    Webers, Cornelius
    Magg, Sven
    Wermter, Stefan
    2019 INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA), 2019, : 7976 - 7982
  • [49] End-to-end Shared Restoration Algorithms in Multi-domain Mesh Networks
    Gao, Zhiying
    Naser, Hassan
    2008 IEEE SYMPOSIUM ON COMPUTERS AND COMMUNICATIONS, VOLS 1-3, 2008, : 933 - 938
  • [50] Online Continual Learning of End-to-End Speech Recognition Models
    Yang, Muqiao
    Lane, Ian
    Watanabe, Shinji
    INTERSPEECH 2022, 2022, : 2668 - 2672