Multichannel End-to-end Speech Recognition

被引:0
|
作者
Ochiai, Tsubasa [1 ]
Watanabe, Shinji [2 ]
Hori, Takaaki [2 ]
Hershey, John R. [2 ]
机构
[1] Doshisha Univ, Kyoto, Japan
[2] Mitsubishi Elect Res Labs MERL, Cambridge, MA 02139 USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The field of speech recognition is in the midst of a paradigm shift: end-to-end neural networks are challenging the dominance of hidden Markov models as a core technology. Using an attention mechanism in a recurrent encoder-decoder architecture solves the dynamic time alignment problem, allowing joint end-to-end training of the acoustic and language modeling components. In this paper we extend the end-to-end framework to encompass microphone array signal processing for noise suppression and speech enhancement within the acoustic encoding network. This allows the beamforming components to be optimized jointly within the recognition architecture to improve the end-to-end speech recognition objective. Experiments on the noisy speech benchmarks (CHiME-4 and AMI) show that our multichannel end-to-end system outperformed the attention-based baseline with input from a conventional adaptive beamformer.
引用
下载
收藏
页数:10
相关论文
共 50 条
  • [1] SPEAKER ADAPTATION FOR MULTICHANNEL END-TO-END SPEECH RECOGNITION
    Ochiai, Tsubasa
    Watanabe, Shinji
    Katagiri, Shigeru
    Hori, Takaaki
    Hershey, John
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 6707 - 6711
  • [2] Unified Architecture for Multichannel End-to-End Speech Recognition With Neural Beamforming
    Ochiai, Tsubasa
    Watanabe, Shinji
    Hori, Takaaki
    Hershey, John R.
    Xiao, Xiong
    IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2017, 11 (08) : 1274 - 1288
  • [3] End-to-End Speech Recognition in Russian
    Markovnikov, Nikita
    Kipyatkova, Irina
    Lyakso, Elena
    SPEECH AND COMPUTER (SPECOM 2018), 2018, 11096 : 377 - 386
  • [4] END-TO-END MULTIMODAL SPEECH RECOGNITION
    Palaskar, Shruti
    Sanabria, Ramon
    Metze, Florian
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5774 - 5778
  • [5] Overview of end-to-end speech recognition
    Wang, Song
    Li, Guanyu
    2018 INTERNATIONAL SYMPOSIUM ON POWER ELECTRONICS AND CONTROL ENGINEERING (ISPECE 2018), 2019, 1187
  • [6] End-to-end Accented Speech Recognition
    Viglino, Thibault
    Motlicek, Petr
    Cernak, Milos
    INTERSPEECH 2019, 2019, : 2140 - 2144
  • [7] END-TO-END AUDIOVISUAL SPEECH RECOGNITION
    Petridis, Stavros
    Stafylakis, Themos
    Ma, Pingchuan
    Cai, Feipeng
    Tzimiropoulos, Georgios
    Pantic, Maja
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 6548 - 6552
  • [8] END-TO-END ANCHORED SPEECH RECOGNITION
    Wang, Yiming
    Fan, Xing
    Chen, I-Fan
    Liu, Yuzong
    Chen, Tongfei
    Hoffmeister, Bjorn
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 7090 - 7094
  • [9] CNN-based Multichannel End-to-End Speech Recognition for Everyday Home Environments
    Yalta, Nelson
    Watanabe, Shinji
    Hori, Takaaki
    Nakadai, Kazuhiro
    Ogata, Tetsuya
    2019 27TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2019,
  • [10] IMPROVING UNSUPERVISED STYLE TRANSFER IN END-TO-END SPEECH SYNTHESIS WITH END-TO-END SPEECH RECOGNITION
    Liu, Da-Rong
    Yang, Chi-Yu
    Wu, Szu-Lin
    Lee, Hung-Yi
    2018 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2018), 2018, : 640 - 647