FusionRNN: Shared Neural Parameters for Multi-Channel Distant Speech Recognition

被引：0

作者：

Parcollet, Titouan ^{[1
]}

Qiu, Xinchi ^{[1
]}

Lane, Nicholas D. ^{[1
,2
]}

机构：

[1] Univ Oxford, Oxford, England

[2] Samsung AI, Cambridge, England

来源：

INTERSPEECH 2020 | 2020年

关键词：

Multi-channel distant speech recognition; shared neural parameters; light gated recurrent unit neural networks; NETWORKS;

D O I：

10.21437/Interspeech.2020-2102

中图分类号：

R36 [病理学]; R76 [耳鼻咽喉科学];

学科分类号：

100104 ; 100213 ;

摘要：

Distant speech recognition remains a challenging application for modern deep learning based Automatic Speech Recognition (ASR) systems, due to complex recording conditions involving noise and reverberation. Multiple microphones are commonly combined with well-known speech processing techniques to enhance the original signals and thus enhance the speech recognizer performance. These multi-channel follow similar input distributions with respect to the global speech information but also contain an important part of noise. Consequently, the input representation robustness is key to obtaining reasonable recognition rates. In this work, we propose a Fusion Layer (FL) based on shared neural parameters. We use it to produce an expressive embedding of multiple microphone signals, that can easily be combined with any existing ASR pipeline. The proposed model called FusionRNN showed promising results on a multi-channel distant speech recognition task, and consistently outperformed baseline models while maintaining an equal training time.

引用

页码：1678 / 1682

页数：5

共 50 条

[31] Multi-channel handwritten digit recognition using neural networks
Chi, ZR
Lu, ZK
Chan, FH
ISCAS '97 - PROCEEDINGS OF 1997 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, VOLS I - IV: CIRCUITS AND SYSTEMS IN THE INFORMATION AGE, 1997, : 625 - 628
[32] Fire Recognition Based On Multi-Channel Convolutional Neural Network
Mao, Wentao
Wang, Wenpeng
Dou, Zhi
Li, Yuan
FIRE TECHNOLOGY, 2018, 54 (02) : 531 - 554
[33] Fire Recognition Based On Multi-Channel Convolutional Neural Network
Wentao Mao
Wenpeng Wang
Zhi Dou
Yuan Li
Fire Technology, 2018, 54 : 531 - 554
[34] MULTI-CHANNEL AUTOMATIC SPEECH RECOGNITION USING DEEP COMPLEX UNET
Kong, Yuxiang
Wu, Jian
Wang, Quandong
Gao, Peng
Zhuang, Weiji
Wang, Yujun
Xie, Lei
2021 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP (SLT), 2021, : 104 - 110
[35] ROBUST MULTI-CHANNEL SPEECH RECOGNITION USING FREQUENCY ALIGNED NETWORK
Park, Taejin
Kumatani, Kenichi
Wu, Minhua
Sundaram, Shiva
2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 6859 - 6863
[36] PERFORMANCE MONITORING FOR AUTOMATIC SPEECH RECOGNITION IN NOISY MULTI-CHANNEL ENVIRONMENTS
Meyerl, Bernd T.
Mallidi, Sri Harish
Martinez, Angel Mario Castro
Paya-Vaya, Guillermo
Kayser, Hendrik
Hermansky, Hynek
2016 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2016), 2016, : 50 - 56
[37] Audio-Visual Multi-Channel Integration and Recognition of Overlapped Speech
Yu, Jianwei
Zhang, Shi-Xiong
Wu, Bo
Liu, Shansong
Hu, Shoukang
Geng, Mengzhe
Liu, Xunying
Meng, Helen
Yu, Dong
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2021, 29 : 2067 - 2082
[38] AUDIO-VISUAL MULTI-CHANNEL SPEECH SEPARATION, DEREVERBERATION AND RECOGNITION
Li, Guinan
Yu, Jianwei
Deng, Jiajun
Liu, Xunying
Meng, Helen
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 6042 - 6046
[39] MIMO-SPEECH: END-TO-END MULTI-CHANNEL MULTI-SPEAKER SPEECH RECOGNITION
Chang, Xuankai
Zhang, Wangyou
Qian, Yanmin
Le Roux, Jonathan
Watanabe, Shinji
2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019), 2019, : 237 - 244
[40] Robust Speaker Recognition Based on Single-Channel and Multi-Channel Speech Enhancement
Taherian, Hassan
Wang, Zhong-Qiu
Chang, Jorge
Wang, DeLiang
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2020, 28 : 1293 - 1302

← 1 2 3 4 5 →