FusionRNN: Shared Neural Parameters for Multi-Channel Distant Speech Recognition

被引:0
|
作者
Parcollet, Titouan [1 ]
Qiu, Xinchi [1 ]
Lane, Nicholas D. [1 ,2 ]
机构
[1] Univ Oxford, Oxford, England
[2] Samsung AI, Cambridge, England
来源
关键词
Multi-channel distant speech recognition; shared neural parameters; light gated recurrent unit neural networks; NETWORKS;
D O I
10.21437/Interspeech.2020-2102
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
Distant speech recognition remains a challenging application for modern deep learning based Automatic Speech Recognition (ASR) systems, due to complex recording conditions involving noise and reverberation. Multiple microphones are commonly combined with well-known speech processing techniques to enhance the original signals and thus enhance the speech recognizer performance. These multi-channel follow similar input distributions with respect to the global speech information but also contain an important part of noise. Consequently, the input representation robustness is key to obtaining reasonable recognition rates. In this work, we propose a Fusion Layer (FL) based on shared neural parameters. We use it to produce an expressive embedding of multiple microphone signals, that can easily be combined with any existing ASR pipeline. The proposed model called FusionRNN showed promising results on a multi-channel distant speech recognition task, and consistently outperformed baseline models while maintaining an equal training time.
引用
收藏
页码:1678 / 1682
页数:5
相关论文
共 50 条
  • [41] Exploiting Single-Channel Speech for Multi-Channel End-to-End Speech Recognition: A Comparative Study
    An, Keyu
    Xiao, Ji
    Ou, Zhijian
    2022 13TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2022, : 180 - 184
  • [42] THE ROYALFLUSH AUTOMATIC SPEECH DIARIZATION AND RECOGNITION SYSTEM FOR IN-CAR MULTI-CHANNEL AUTOMATIC SPEECH RECOGNITION CHALLENGE
    Tian, Jingguang
    Ye, Shuaishuai
    Chen, Shunfei
    Xiang, Yang
    Yin, Zhaohui
    Hu, Xinhui
    Xu, Xinkang
    2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING WORKSHOPS, ICASSPW 2024, 2024, : 1 - 2
  • [43] Convolutional Neural Networks for Distant Speech Recognition
    Swietojanski, Pawel
    Ghoshal, Arnab
    Renals, Steve
    IEEE SIGNAL PROCESSING LETTERS, 2014, 21 (09) : 1120 - 1124
  • [44] Multi-Channel Speech Enhancement and Amplitude Modulation Analysis for Noise Robust Automatic Speech Recognition
    Moritz, Niko
    Adiloglu, Kamil
    Anemueller, Joern
    Goetze, Stefan
    Kollmeier, Birger
    COMPUTER SPEECH AND LANGUAGE, 2017, 46 : 558 - 573
  • [45] Correction to: Fire Recognition Based On Multi-Channel Convolutional Neural Network
    Wentao Mao
    Wenpeng Wang
    Zhi Dou
    Yuan Li
    Fire Technology, 2018, 54 : 809 - 809
  • [46] Video fire recognition based on multi-channel convolutional neural network
    Zhong, Chen
    Shao, Yu
    Ding, Hongjun
    Wang, Ke
    2020 3RD INTERNATIONAL CONFERENCE ON COMPUTER INFORMATION SCIENCE AND APPLICATION TECHNOLOGY (CISAT) 2020, 2020, 1634
  • [47] Robust speech recognition with multi-channel codebook dependent cepstral normalization (MCDCN)
    Deligne, S
    Gopinath, R
    ASRU 2001: IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING, CONFERENCE PROCEEDINGS, 2001, : 151 - 154
  • [48] Multi-Channel Convolutional Neural Network for Twitter Emotion and Sentiment Recognition
    Islam, Jumayel
    Mercer, Robert E.
    Xiao, Lu
    2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, 2019, : 1355 - 1365
  • [49] Environmental robust speech and speaker recognition through multi-channel histogram equalization
    Squartini, Stefano
    Principi, Emanuele
    Rotili, Rudy
    Piazza, Francesco
    NEUROCOMPUTING, 2012, 78 (01) : 111 - 120
  • [50] Automatic Modulation Recognition Based on Multi-Channel Neural Network Model
    Zhang, Xianchao
    Ma, Shengyu
    Shi, Jian
    Li, Panpan
    Yue, Guangxue
    2022 14TH INTERNATIONAL CONFERENCE ON WIRELESS COMMUNICATIONS AND SIGNAL PROCESSING, WCSP, 2022, : 326 - 330