SPEAKER ADAPTATION FOR MULTICHANNEL END-TO-END SPEECH RECOGNITION

被引：0

作者：

Ochiai, Tsubasa ^{[1
]}

Watanabe, Shinji ^{[2
,3
]}

Katagiri, Shigeru ^{[1
]}

Hori, Takaaki ^{[2
]}

Hershey, John ^{[2
]}

机构：

[1] Doshisha Univ, Grad Sch Sci & Engn, Kyoto, Japan

[2] Mitsubishi Elect Res Labs, Cambridge, MA USA

[3] Johns Hopkins Univ, Baltimore, MD 21218 USA

来源：

2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2018年

关键词：

multichannel end-to-end ASR; neural beam-former; attention-based encoder-decoder; speaker adaptation; NETWORKS;

D O I：

暂无

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Recent work on multichannel end-to-end automatic speech recognition (ASR) has shown that multichannel speech enhancement and speech recognition functions can be integrated into a deep neural network (DNN)-based system, and promising experimental results have been shown using the CHiME-4 and AMI corpora. In other recent DNN-based hidden Markov model (DNN-HMM) hybrid architectures, the effectiveness of speaker adaptation has been well established. Motivated by these results, we propose a multi-path adaptation scheme for end-to-end multichannel ASR, which combines the unprocessed noisy speech features with a speech-enhanced pathway to improve upon previous end-to-end ASR approaches. Experimental results using CHiME-4 show that (1) our proposed multi-path adaptation scheme improves ASR performance and (2) adapting the encoder network is more effective than adapting the neural beam-former, attention mechanism, or decoder network.

引用

页码：6707 / 6711

页数：5

共 50 条

[1] Multichannel End-to-end Speech Recognition
Ochiai, Tsubasa
Watanabe, Shinji
Hori, Takaaki
Hershey, John R.
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 70, 2017, 70
[2] Speaker Adaptation for Attention-Based End-to-End Speech Recognition
Meng, Zhong
Gaur, Yashesh
Li, Jinyu
Gong, Yifan
INTERSPEECH 2019, 2019, : 241 - 245
[3] END-TO-END MULTI-SPEAKER SPEECH RECOGNITION
Settle, Shane
Le Roux, Jonathan
Hori, Takaaki
Watanabe, Shinji
Hershey, John R.
2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 4819 - 4823
[4] Personality-aware Training based Speaker Adaptation for End-to-end Speech Recognition
Gu, Yue
Du, Zhihao
Zhang, Shiliang
Chen, Qian
Han, Jiqing
INTERSPEECH 2023, 2023, : 1249 - 1253
[5] END-TO-END MULTI-SPEAKER SPEECH RECOGNITION WITH TRANSFORMER
Chang, Xuankai
Zhang, Wangyou
Qian, Yanmin
Le Roux, Jonathan
Watanabe, Shinji
2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 6134 - 6138
[6] End-to-End Multilingual Multi-Speaker Speech Recognition
Seki, Hiroshi
Hori, Takaaki
Watanabe, Shinji
Le Roux, Jonathan
Hershey, John R.
INTERSPEECH 2019, 2019, : 3755 - 3759
[7] A Purely End-to-end System for Multi-speaker Speech Recognition
Seki, Hiroshi
Hori, Takaaki
Watanabe, Shinji
Le Roux, Jonathan
Hershey, John R.
PROCEEDINGS OF THE 56TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL), VOL 1, 2018, : 2620 - 2630
[8] Unified Architecture for Multichannel End-to-End Speech Recognition With Neural Beamforming
Ochiai, Tsubasa
Watanabe, Shinji
Hori, Takaaki
Hershey, John R.
Xiao, Xiong
IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2017, 11 (08) : 1274 - 1288
[9] Dynamic Speaker Representations Adjustment and Decoder Factorization for Speaker Adaptation in End-to-End Speech Synthesis
Fu, Ruibo
Tao, Jianhua
Wen, Zhengqi
Yi, Jiangyan
Wang, Tao
Qiang, Chunyu
INTERSPEECH 2020, 2020, : 4701 - 4705
[10] End-to-End Multi-Speaker Speech Recognition using Speaker Embeddings and Transfer Learning
Denisov, Pavel
Ngoc Thang Vu
INTERSPEECH 2019, 2019, : 4425 - 4429

← 1 2 3 4 5 →