FusionRNN: Shared Neural Parameters for Multi-Channel Distant Speech Recognition

被引：0

作者：

Parcollet, Titouan ^{[1
]}

Qiu, Xinchi ^{[1
]}

Lane, Nicholas D. ^{[1
,2
]}

机构：

[1] Univ Oxford, Oxford, England

[2] Samsung AI, Cambridge, England

来源：

INTERSPEECH 2020 | 2020年

关键词：

Multi-channel distant speech recognition; shared neural parameters; light gated recurrent unit neural networks; NETWORKS;

D O I：

10.21437/Interspeech.2020-2102

中图分类号：

R36 [病理学]; R76 [耳鼻咽喉科学];

学科分类号：

100104 ; 100213 ;

摘要：

Distant speech recognition remains a challenging application for modern deep learning based Automatic Speech Recognition (ASR) systems, due to complex recording conditions involving noise and reverberation. Multiple microphones are commonly combined with well-known speech processing techniques to enhance the original signals and thus enhance the speech recognizer performance. These multi-channel follow similar input distributions with respect to the global speech information but also contain an important part of noise. Consequently, the input representation robustness is key to obtaining reasonable recognition rates. In this work, we propose a Fusion Layer (FL) based on shared neural parameters. We use it to produce an expressive embedding of multiple microphone signals, that can easily be combined with any existing ASR pipeline. The proposed model called FusionRNN showed promising results on a multi-channel distant speech recognition task, and consistently outperformed baseline models while maintaining an equal training time.

引用

页码：1678 / 1682

页数：5

共 50 条

[41] Exploiting Single-Channel Speech for Multi-Channel End-to-End Speech Recognition: A Comparative Study
An, Keyu
Xiao, Ji
Ou, Zhijian
2022 13TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2022, : 180 - 184
[42] THE ROYALFLUSH AUTOMATIC SPEECH DIARIZATION AND RECOGNITION SYSTEM FOR IN-CAR MULTI-CHANNEL AUTOMATIC SPEECH RECOGNITION CHALLENGE
Tian, Jingguang
Ye, Shuaishuai
Chen, Shunfei
Xiang, Yang
Yin, Zhaohui
Hu, Xinhui
Xu, Xinkang
2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING WORKSHOPS, ICASSPW 2024, 2024, : 1 - 2
[43] Convolutional Neural Networks for Distant Speech Recognition
Swietojanski, Pawel
Ghoshal, Arnab
Renals, Steve
IEEE SIGNAL PROCESSING LETTERS, 2014, 21 (09) : 1120 - 1124
[44] Multi-Channel Speech Enhancement and Amplitude Modulation Analysis for Noise Robust Automatic Speech Recognition
Moritz, Niko
Adiloglu, Kamil
Anemueller, Joern
Goetze, Stefan
Kollmeier, Birger
COMPUTER SPEECH AND LANGUAGE, 2017, 46 : 558 - 573
[45] Correction to: Fire Recognition Based On Multi-Channel Convolutional Neural Network
Wentao Mao
Wenpeng Wang
Zhi Dou
Yuan Li
Fire Technology, 2018, 54 : 809 - 809
[46] Video fire recognition based on multi-channel convolutional neural network
Zhong, Chen
Shao, Yu
Ding, Hongjun
Wang, Ke
2020 3RD INTERNATIONAL CONFERENCE ON COMPUTER INFORMATION SCIENCE AND APPLICATION TECHNOLOGY (CISAT) 2020, 2020, 1634
[47] Robust speech recognition with multi-channel codebook dependent cepstral normalization (MCDCN)
Deligne, S
Gopinath, R
ASRU 2001: IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING, CONFERENCE PROCEEDINGS, 2001, : 151 - 154
[48] Multi-Channel Convolutional Neural Network for Twitter Emotion and Sentiment Recognition
Islam, Jumayel
Mercer, Robert E.
Xiao, Lu
2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, 2019, : 1355 - 1365
[49] Environmental robust speech and speaker recognition through multi-channel histogram equalization
Squartini, Stefano
Principi, Emanuele
Rotili, Rudy
Piazza, Francesco
NEUROCOMPUTING, 2012, 78 (01) : 111 - 120
[50] Automatic Modulation Recognition Based on Multi-Channel Neural Network Model
Zhang, Xianchao
Ma, Shengyu
Shi, Jian
Li, Panpan
Yue, Guangxue
2022 14TH INTERNATIONAL CONFERENCE ON WIRELESS COMMUNICATIONS AND SIGNAL PROCESSING, WCSP, 2022, : 326 - 330

← 1 2 3 4 5 →