Neural Network Adaptive Beamforming for Robust Multichannel Speech Recognition

被引:58
|
作者
Li, Bo [1 ]
Sainath, Tara N. [1 ]
Weiss, Ron J. [1 ]
Wilson, Kevin W. [1 ]
Bacchiani, Michiel [1 ]
机构
[1] Google Inc, New York, NY 10011 USA
关键词
speech recognition; multichannel; beamforming; adaptive filtering;
D O I
10.21437/Interspeech.2016-173
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Joint multichannel enhancement and acoustic modeling using neural networks has shown promise over the past few years. However, one shortcoming of previous work [1, 2, 3] is that the filters learned during training are fixed for decoding, potentially limiting the ability of these models to adapt to previously unseen or changing conditions. In this paper we explore a neural network adaptive beamforming (NAB) technique to address this issue. Specifically, we use LSTM layers to predict time domain beamforming filter coefficients at each input frame. These filters are convolved with the framed time domain input signal and summed across channels, essentially performing FIR filter -and sum beamforming using the dynamically adapted filter. The beamformer output is passed into a waveform CLDNN acoustic model [4] which is trained jointly with the filter prediction LSTM layers. We find that the proposed NAB model achieves a 12.7% relative improvement in WER over a single channel model [4] and reaches similar performance to a "factored" model architecture which utilizes several fixed spatial filters [3] on a 2,000-hour Voice Search task, with a 17.9% decrease in computational cost.
引用
收藏
页码:1976 / 1980
页数:5
相关论文
共 50 条
  • [1] ROBUST SPEECH RECOGNITION USING BEAMFORMING WITH ADAPTIVE MICROPHONE GAINS AND MULTICHANNEL NOISE REDUCTION
    Zhao, Shengkui
    Xiao, Xiong
    Zhang, Zhaofeng
    Thi Ngoc Tho Nguyen
    Zhong, Xionghu
    Ren, Bo
    Wang, Longbiao
    Jones, Douglas L.
    Chng, Eng Siong
    Li, Haizhou
    [J]. 2015 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING (ASRU), 2015, : 460 - 467
  • [2] DEEP LONG SHORT-TERM MEMORY ADAPTIVE BEAMFORMING NETWORKS FOR MULTICHANNEL ROBUST SPEECH RECOGNITION
    Meng, Zhong
    Watanabe, Shinji
    Hershey, John R.
    Erdogan, Hakan
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 271 - 275
  • [3] Robust Adaptive Beamforming Algorithm Based on Neural Network
    Song, Xin
    Wang, Jinkuan
    Niu, Xuefen
    [J]. 2008 IEEE INTERNATIONAL CONFERENCE ON AUTOMATION AND LOGISTICS, VOLS 1-6, 2008, : 1844 - 1849
  • [4] Robust Adaptive Beamforming Based on a Convolutional Neural Network
    Liao, Zhipeng
    Duan, Keqing
    He, Jinjun
    Qiu, Zizhou
    Li, Binbin
    [J]. ELECTRONICS, 2023, 12 (12)
  • [5] Neural network-based robust adaptive beamforming
    Song, Xin
    Wang, Jinkuan
    Han, Yinghua
    Tian, Dan
    [J]. 2006 IEEE INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORK PROCEEDINGS, VOLS 1-10, 2006, : 1758 - +
  • [6] Unified Architecture for Multichannel End-to-End Speech Recognition With Neural Beamforming
    Ochiai, Tsubasa
    Watanabe, Shinji
    Hori, Takaaki
    Hershey, John R.
    Xiao, Xiong
    [J]. IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2017, 11 (08) : 1274 - 1288
  • [7] ADAPTIVE BEAMFORMING AND ADAPTIVE TRAINING OF DNN ACOUSTIC MODELS FOR ENHANCED MULTICHANNEL NOISY SPEECH RECOGNITION
    Prudnikov, Alexey
    Korenevsky, Maxim
    Aleinik, Sergei
    [J]. 2015 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING (ASRU), 2015, : 401 - 408
  • [8] Robust adaptive beamforming via residual convolutional neural network
    Liu, Fulai
    Qin, Dongbao
    Li, Xubin
    Du, Yufeng
    Dou, Xiuquan
    Du, Ruiyan
    [J]. INTERNATIONAL JOURNAL OF MICROWAVE AND WIRELESS TECHNOLOGIES, 2023,
  • [9] Multiresolution Convolutional Neural Network For Robust Speech Recognition
    Naderi, Navid
    Nasersharif, Babak
    [J]. 2017 25TH IRANIAN CONFERENCE ON ELECTRICAL ENGINEERING (ICEE), 2017, : 1459 - 1464
  • [10] Deep Neural Network Based Speech Separation for Robust Speech Recognition
    Tu Yanhui
    Jun, Du
    Xu Yong
    Dai Lirong
    Chin-Hui, Lee
    [J]. 2014 12TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING (ICSP), 2014, : 532 - 536