FusionRNN: Shared Neural Parameters for Multi-Channel Distant Speech Recognition

被引:0
|
作者
Parcollet, Titouan [1 ]
Qiu, Xinchi [1 ]
Lane, Nicholas D. [1 ,2 ]
机构
[1] Univ Oxford, Oxford, England
[2] Samsung AI, Cambridge, England
来源
关键词
Multi-channel distant speech recognition; shared neural parameters; light gated recurrent unit neural networks; NETWORKS;
D O I
10.21437/Interspeech.2020-2102
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
Distant speech recognition remains a challenging application for modern deep learning based Automatic Speech Recognition (ASR) systems, due to complex recording conditions involving noise and reverberation. Multiple microphones are commonly combined with well-known speech processing techniques to enhance the original signals and thus enhance the speech recognizer performance. These multi-channel follow similar input distributions with respect to the global speech information but also contain an important part of noise. Consequently, the input representation robustness is key to obtaining reasonable recognition rates. In this work, we propose a Fusion Layer (FL) based on shared neural parameters. We use it to produce an expressive embedding of multiple microphone signals, that can easily be combined with any existing ASR pipeline. The proposed model called FusionRNN showed promising results on a multi-channel distant speech recognition task, and consistently outperformed baseline models while maintaining an equal training time.
引用
收藏
页码:1678 / 1682
页数:5
相关论文
共 50 条
  • [1] Quaternion Neural Networks for Multi-channel Distant Speech Recognition
    Qiu, Xinchi
    Parcollet, Titouan
    Ravanelli, Mirco
    Lane, Nicholas D.
    Morchid, Mohamed
    INTERSPEECH 2020, 2020, : 329 - 333
  • [2] FREQUENCY DOMAIN MULTI-CHANNEL ACOUSTIC MODELING FOR DISTANT SPEECH RECOGNITION
    Wu Minhua
    Kumatani, Kenichi
    Sundaram, Shiva
    Strom, Nikko
    Hoffmeister, Bjorn
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6640 - 6644
  • [3] UNet plus plus -Based Multi-Channel Speech Dereverberation and Distant Speech Recognition
    Zhao, Tuo
    Zhao, Yunxin
    Wang, Shaojun
    Han, Mei
    2021 12TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2021,
  • [4] DEVELOPMENT OF DISTANT MULTI-CHANNEL SPEECH AND NOISE DATABASES FOR SPEECH RECOGNITION BY IN-DOOR CONVERSATIONAL ROBOTS
    Suh, Youngjoo
    Kim, Younggwan
    Lim, Hyungjun
    Goo, Jahyun
    Jung, Youngmoon
    Choi, Yeonjoo
    Kim, Hoirin
    Choi, Dae-Lim
    Lee, Yongju
    2017 20TH CONFERENCE OF THE ORIENTAL CHAPTER OF THE INTERNATIONAL COORDINATING COMMITTEE ON SPEECH DATABASES AND SPEECH I/O SYSTEMS AND ASSESSMENT (O-COCOSDA), 2017, : 5 - 8
  • [5] ACOUSTIC MODELING FOR DISTANT MULTI-TALKER SPEECH RECOGNITION WITH SINGLE- AND MULTI-CHANNEL BRANCHES
    Kanda, Naoyuki
    Fujita, Yusuke
    Horiguchi, Shota
    Ikeshita, Rintaro
    Nagamatsu, Kenji
    Watanabe, Shinji
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6630 - 6634
  • [6] Multi-Channel Transformer Transducer for Speech Recognition
    Chang, Feng-Ju
    Radfar, Martin
    Mouchtaris, Athanasios
    Omologo, Maurizio
    INTERSPEECH 2021, 2021, : 296 - 300
  • [7] Distant-Talking Speech Recognition Based on Spectral Subtraction by Multi-Channel LMS Algorithm
    Wang, Longbiao
    Kitaoka, Norihide
    Nakagawa, Seiichi
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2011, E94D (03): : 659 - 667
  • [8] Multi-channel multi-speaker transformer for speech recognition
    Guo Yifan
    Tian Yao
    Suo Hongbin
    Wan Yulong
    INTERSPEECH 2023, 2023, : 4918 - 4922
  • [9] On using Parameterized Multi-channel Non-causal Wiener Filter-Adapted Convolutional Neural Networks for Distant Speech Recognition
    Lee, Jeehye
    Chang, Joon-Hyuk
    Sohn, Jinho
    2016 INTERNATIONAL CONFERENCE ON ELECTRONICS, INFORMATION, AND COMMUNICATIONS (ICEIC), 2016,
  • [10] Multi-channel sub-band speech recognition
    McCowan I.A.
    Sridharan S.
    EURASIP Journal on Advances in Signal Processing, 2001 (1) : 45 - 52