SPEAKER ADAPTATION FOR MULTICHANNEL END-TO-END SPEECH RECOGNITION

被引:0
|
作者
Ochiai, Tsubasa [1 ]
Watanabe, Shinji [2 ,3 ]
Katagiri, Shigeru [1 ]
Hori, Takaaki [2 ]
Hershey, John [2 ]
机构
[1] Doshisha Univ, Grad Sch Sci & Engn, Kyoto, Japan
[2] Mitsubishi Elect Res Labs, Cambridge, MA USA
[3] Johns Hopkins Univ, Baltimore, MD 21218 USA
关键词
multichannel end-to-end ASR; neural beam-former; attention-based encoder-decoder; speaker adaptation; NETWORKS;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Recent work on multichannel end-to-end automatic speech recognition (ASR) has shown that multichannel speech enhancement and speech recognition functions can be integrated into a deep neural network (DNN)-based system, and promising experimental results have been shown using the CHiME-4 and AMI corpora. In other recent DNN-based hidden Markov model (DNN-HMM) hybrid architectures, the effectiveness of speaker adaptation has been well established. Motivated by these results, we propose a multi-path adaptation scheme for end-to-end multichannel ASR, which combines the unprocessed noisy speech features with a speech-enhanced pathway to improve upon previous end-to-end ASR approaches. Experimental results using CHiME-4 show that (1) our proposed multi-path adaptation scheme improves ASR performance and (2) adapting the encoder network is more effective than adapting the neural beam-former, attention mechanism, or decoder network.
引用
收藏
页码:6707 / 6711
页数:5
相关论文
共 50 条
  • [1] Multichannel End-to-end Speech Recognition
    Ochiai, Tsubasa
    Watanabe, Shinji
    Hori, Takaaki
    Hershey, John R.
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 70, 2017, 70
  • [2] Speaker Adaptation for Attention-Based End-to-End Speech Recognition
    Meng, Zhong
    Gaur, Yashesh
    Li, Jinyu
    Gong, Yifan
    INTERSPEECH 2019, 2019, : 241 - 245
  • [3] END-TO-END MULTI-SPEAKER SPEECH RECOGNITION
    Settle, Shane
    Le Roux, Jonathan
    Hori, Takaaki
    Watanabe, Shinji
    Hershey, John R.
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 4819 - 4823
  • [4] Personality-aware Training based Speaker Adaptation for End-to-end Speech Recognition
    Gu, Yue
    Du, Zhihao
    Zhang, Shiliang
    Chen, Qian
    Han, Jiqing
    INTERSPEECH 2023, 2023, : 1249 - 1253
  • [5] END-TO-END MULTI-SPEAKER SPEECH RECOGNITION WITH TRANSFORMER
    Chang, Xuankai
    Zhang, Wangyou
    Qian, Yanmin
    Le Roux, Jonathan
    Watanabe, Shinji
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 6134 - 6138
  • [6] End-to-End Multilingual Multi-Speaker Speech Recognition
    Seki, Hiroshi
    Hori, Takaaki
    Watanabe, Shinji
    Le Roux, Jonathan
    Hershey, John R.
    INTERSPEECH 2019, 2019, : 3755 - 3759
  • [7] A Purely End-to-end System for Multi-speaker Speech Recognition
    Seki, Hiroshi
    Hori, Takaaki
    Watanabe, Shinji
    Le Roux, Jonathan
    Hershey, John R.
    PROCEEDINGS OF THE 56TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL), VOL 1, 2018, : 2620 - 2630
  • [8] Unified Architecture for Multichannel End-to-End Speech Recognition With Neural Beamforming
    Ochiai, Tsubasa
    Watanabe, Shinji
    Hori, Takaaki
    Hershey, John R.
    Xiao, Xiong
    IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2017, 11 (08) : 1274 - 1288
  • [9] Dynamic Speaker Representations Adjustment and Decoder Factorization for Speaker Adaptation in End-to-End Speech Synthesis
    Fu, Ruibo
    Tao, Jianhua
    Wen, Zhengqi
    Yi, Jiangyan
    Wang, Tao
    Qiang, Chunyu
    INTERSPEECH 2020, 2020, : 4701 - 4705
  • [10] End-to-End Multi-Speaker Speech Recognition using Speaker Embeddings and Transfer Learning
    Denisov, Pavel
    Ngoc Thang Vu
    INTERSPEECH 2019, 2019, : 4425 - 4429