FusionRNN: Shared Neural Parameters for Multi-Channel Distant Speech Recognition

被引：0

作者：

Parcollet, Titouan ^{[1
]}

Qiu, Xinchi ^{[1
]}

Lane, Nicholas D. ^{[1
,2
]}

机构：

[1] Univ Oxford, Oxford, England

[2] Samsung AI, Cambridge, England

来源：

INTERSPEECH 2020 | 2020年

关键词：

Multi-channel distant speech recognition; shared neural parameters; light gated recurrent unit neural networks; NETWORKS;

D O I：

10.21437/Interspeech.2020-2102

中图分类号：

R36 [病理学]; R76 [耳鼻咽喉科学];

学科分类号：

100104 ; 100213 ;

摘要：

Distant speech recognition remains a challenging application for modern deep learning based Automatic Speech Recognition (ASR) systems, due to complex recording conditions involving noise and reverberation. Multiple microphones are commonly combined with well-known speech processing techniques to enhance the original signals and thus enhance the speech recognizer performance. These multi-channel follow similar input distributions with respect to the global speech information but also contain an important part of noise. Consequently, the input representation robustness is key to obtaining reasonable recognition rates. In this work, we propose a Fusion Layer (FL) based on shared neural parameters. We use it to produce an expressive embedding of multiple microphone signals, that can easily be combined with any existing ASR pipeline. The proposed model called FusionRNN showed promising results on a multi-channel distant speech recognition task, and consistently outperformed baseline models while maintaining an equal training time.

引用

页码：1678 / 1682

页数：5

共 50 条

[21] Multi-channel Attention for End-to-End Speech Recognition
Braun, Stefan
Neil, Daniel
Anumula, Jithendar
Ceolini, Enea
Liu, Shih-Chii
19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 17 - 21
[22] Audio-visual Multi-channel Recognition of Overlapped Speech
Yu, Jianwei
Wu, Bo
Gu, Rongzhi
Zhang, Shi-Xiong
Chen, Lianwu
Xu, Yong
Yu, Meng
Su, Dan
Yu, Dong
Liu, Xunying
Meng, Helen
INTERSPEECH 2020, 2020, : 3496 - 3500
[23] The segmentation of multi-channel meeting recordings for automatic speech recognition
Dines, John
Vepa, Jithendra
Hain, Thomas
INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 1213 - +
[24] MULTI-CHANNEL OVERLAPPED SPEECH RECOGNITION WITH LOCATION GUIDED SPEECH EXTRACTION NETWORK
Chen, Zhuo
Xiao, Xiong
Yoshioka, Takuya
Erdogan, Hakan
Li, Jinyu
Gong, Yifan
2018 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2018), 2018, : 558 - 565
[25] A MULTI-CHANNEL CORPUS FOR DISTANT-SPEECH INTERACTION IN PRESENCE OF KNOWN INTERFERENCES
Zwyssig, Erich
Ravanelli, Mirco
Svaizer, Piergiorgio
Omologo, Maurizio
2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 4480 - 4484
[26] MULTI-CHANNEL SPEECH ENHANCEMENT USING GRAPH NEURAL NETWORKS
Tzirakis, Panagiotis
Kumar, Anurag
Donley, Jacob
2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 3415 - 3419
[27] NEURAL NETWORKS FOR DISTANT SPEECH RECOGNITION
Renals, Steve
Swietojanski, Pawel
2014 4TH JOINT WORKSHOP ON HANDS-FREE SPEECH COMMUNICATION AND MICROPHONE ARRAYS (HSCMA), 2014, : 172 - 176
[28] A unified network for multi-speaker speech recognition with multi-channel recordings
Liu, Conggui
Inoue, Nakamasa
Shinoda, Koichi
2017 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC 2017), 2017, : 1304 - 1307
[29] Deep Neural Network-Based Generalized Sidelobe Canceller for Robust Multi-channel Speech Recognition
Li, Guanjun
Liang, Shan
Nie, Shuai
Liu, Wenju
Yang, Zhanlei
Xiao, Longshuai
INTERSPEECH 2020, 2020, : 51 - 55
[30] Speech distortion weighted multi-channel Wiener filter and its application to speech recognition
Kim, Gibak
IEICE ELECTRONICS EXPRESS, 2015, 12 (06): : 1 - 7

← 1 2 3 4 5 →