FusionRNN: Shared Neural Parameters for Multi-Channel Distant Speech Recognition

被引：0

作者：

Parcollet, Titouan ^{[1
]}

Qiu, Xinchi ^{[1
]}

Lane, Nicholas D. ^{[1
,2
]}

机构：

[1] Univ Oxford, Oxford, England

[2] Samsung AI, Cambridge, England

来源：

INTERSPEECH 2020 | 2020年

关键词：

Multi-channel distant speech recognition; shared neural parameters; light gated recurrent unit neural networks; NETWORKS;

D O I：

10.21437/Interspeech.2020-2102

中图分类号：

R36 [病理学]; R76 [耳鼻咽喉科学];

学科分类号：

100104 ; 100213 ;

摘要：

Distant speech recognition remains a challenging application for modern deep learning based Automatic Speech Recognition (ASR) systems, due to complex recording conditions involving noise and reverberation. Multiple microphones are commonly combined with well-known speech processing techniques to enhance the original signals and thus enhance the speech recognizer performance. These multi-channel follow similar input distributions with respect to the global speech information but also contain an important part of noise. Consequently, the input representation robustness is key to obtaining reasonable recognition rates. In this work, we propose a Fusion Layer (FL) based on shared neural parameters. We use it to produce an expressive embedding of multiple microphone signals, that can easily be combined with any existing ASR pipeline. The proposed model called FusionRNN showed promising results on a multi-channel distant speech recognition task, and consistently outperformed baseline models while maintaining an equal training time.

引用

页码：1678 / 1682

页数：5

共 50 条

[1] Quaternion Neural Networks for Multi-channel Distant Speech Recognition
Qiu, Xinchi
Parcollet, Titouan
Ravanelli, Mirco
Lane, Nicholas D.
Morchid, Mohamed
INTERSPEECH 2020, 2020, : 329 - 333
[2] FREQUENCY DOMAIN MULTI-CHANNEL ACOUSTIC MODELING FOR DISTANT SPEECH RECOGNITION
Wu Minhua
Kumatani, Kenichi
Sundaram, Shiva
Strom, Nikko
Hoffmeister, Bjorn
2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6640 - 6644
[3] UNet plus plus -Based Multi-Channel Speech Dereverberation and Distant Speech Recognition
Zhao, Tuo
Zhao, Yunxin
Wang, Shaojun
Han, Mei
2021 12TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2021,
[4] DEVELOPMENT OF DISTANT MULTI-CHANNEL SPEECH AND NOISE DATABASES FOR SPEECH RECOGNITION BY IN-DOOR CONVERSATIONAL ROBOTS
Suh, Youngjoo
Kim, Younggwan
Lim, Hyungjun
Goo, Jahyun
Jung, Youngmoon
Choi, Yeonjoo
Kim, Hoirin
Choi, Dae-Lim
Lee, Yongju
2017 20TH CONFERENCE OF THE ORIENTAL CHAPTER OF THE INTERNATIONAL COORDINATING COMMITTEE ON SPEECH DATABASES AND SPEECH I/O SYSTEMS AND ASSESSMENT (O-COCOSDA), 2017, : 5 - 8
[5] ACOUSTIC MODELING FOR DISTANT MULTI-TALKER SPEECH RECOGNITION WITH SINGLE- AND MULTI-CHANNEL BRANCHES
Kanda, Naoyuki
Fujita, Yusuke
Horiguchi, Shota
Ikeshita, Rintaro
Nagamatsu, Kenji
Watanabe, Shinji
2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6630 - 6634
[6] Multi-Channel Transformer Transducer for Speech Recognition
Chang, Feng-Ju
Radfar, Martin
Mouchtaris, Athanasios
Omologo, Maurizio
INTERSPEECH 2021, 2021, : 296 - 300
[7] Distant-Talking Speech Recognition Based on Spectral Subtraction by Multi-Channel LMS Algorithm
Wang, Longbiao
Kitaoka, Norihide
Nakagawa, Seiichi
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2011, E94D (03): : 659 - 667
[8] Multi-channel multi-speaker transformer for speech recognition
Guo Yifan
Tian Yao
Suo Hongbin
Wan Yulong
INTERSPEECH 2023, 2023, : 4918 - 4922
[9] On using Parameterized Multi-channel Non-causal Wiener Filter-Adapted Convolutional Neural Networks for Distant Speech Recognition
Lee, Jeehye
Chang, Joon-Hyuk
Sohn, Jinho
2016 INTERNATIONAL CONFERENCE ON ELECTRONICS, INFORMATION, AND COMMUNICATIONS (ICEIC), 2016,
[10] Multi-channel sub-band speech recognition
McCowan I.A.
Sridharan S.
EURASIP Journal on Advances in Signal Processing, 2001 (1) : 45 - 52

← 1 2 3 4 5 →