On multi-domain training and adaptation of end-to-end RNN acoustic models for distant speech recognition

被引:13
|
作者
Mirsamadi, Seyedmandad [1 ]
Hansen, John H. L. [1 ]
机构
[1] Univ Texas Dallas, CRSS, Richardson, TX 75080 USA
关键词
distant speech recognition; recurrent neural network; multi-domain training;
D O I
10.21437/Interspeech.2017-398
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recognition of distant (far-field) speech is a challenge for ASR due to mismatch in recording conditions resulting from room reverberation and environment noise. Given the remarkable learning capacity of deep neural networks, there is increasing interest to address this problem by using a large corpus of reverberant far-field speech to train robust models. In this study. we explore how an end-to-end RNN acoustic model trained on speech from different rooms and acoustic conditions (different domains) achieves robustness to environmental variations. It is shown that the first hidden layer acts as a domain separator, projecting the data from different domains into different sub-spaces. The subsequent layers then use this encoded domain knowledge to map these features to final representations that are invariant to domain change. This mechanism is closely related to noise-aware or room-aware approaches which append manually-extracted domain signatures to the input features. Additionaly, we demonstrate how this understanding of the learning procedure provides useful guidance for model adaptation to new acoustic conditions. We present results based on AMI corpus to demonstrate the propagation of domain information in a deep RNN, and perform recognition experiments which indicate the role of encoded domain knowledge on training and adaptation of RNN acoustic models.
引用
下载
收藏
页码:404 / 408
页数:5
相关论文
共 50 条
  • [1] Multi-domain adversarial training of neural network acoustic models for distant speech recognition
    Mirsamadi, Seyedmahdad
    Hansen, John H. L.
    SPEECH COMMUNICATION, 2019, 106 : 21 - 30
  • [2] Unsupervised Domain Adaptation on End-to-End Multi-Talker Overlapped Speech Recognition
    Zheng, Lin
    Zhu, Han
    Tian, Sanli
    Zhao, Qingwei
    Li, Ta
    IEEE Signal Processing Letters, 2024, 31 : 3119 - 3123
  • [3] End-to-End Training of Acoustic Models for Large Vocabulary Continuous Speech Recognition with TensorFlow
    Variani, Ehsan
    Bagby, Tom
    McDermott, Erik
    Bacchiani, Michiel
    18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 1641 - 1645
  • [4] END-TO-END SPEECH RECOGNITION WITH WORD-BASED RNN LANGUAGE MODELS
    Hori, Takaaki
    Cho, Jaejin
    Watanabe, Shinji
    2018 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2018), 2018, : 389 - 396
  • [5] END-TO-END SPEECH RECOGNITION FROM FEDERATED ACOUSTIC MODELS
    Gao, Yan
    Parcollet, Titouan
    Zaiem, Salah
    Fernandez-Marques, Javier
    de Gusmao, Pedro P. B.
    Beutel, Daniel J.
    Lane, Nicholas D.
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 7227 - 7231
  • [6] Improved training of end-to-end attention models for speech recognition
    Zeyer, Albert
    Irie, Kazuki
    Schlueter, Ralf
    Ney, Hermann
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 7 - 11
  • [7] Minimum Bayes Risk Training of RNN-Transducer for End-to-End Speech Recognition
    Weng, Chao
    Yu, Chengzhu
    Cui, Jia
    Zhang, Chunlei
    Yu, Dong
    INTERSPEECH 2020, 2020, : 966 - 970
  • [8] End-to-end speech recognition with Alignment RNN-Transducer
    Tian, Ying
    Li, Zerui
    Liu, Min
    Ouchi, Kazushige
    Yan, Long
    Zhao, Dan
    2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,
  • [9] IMPROVING RNN TRANSDUCER MODELING FOR END-TO-END SPEECH RECOGNITION
    Li, Jinyu
    Zhao, Rui
    Hu, Hu
    Gong, Yifan
    2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019), 2019, : 114 - 121
  • [10] EXPLORING PRE-TRAINING WITH ALIGNMENTS FOR RNN TRANSDUCER BASED END-TO-END SPEECH RECOGNITION
    Hu, Hu
    Zhao, Rui
    Li, Jinyu
    Lu, Liang
    Gong, Yifan
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 7079 - 7083