EXPLORING PRACTICAL ASPECTS OF NEURAL MASK-BASED BEAMFORMING FOR FAR-FIELD SPEECH RECOGNITION

被引:0
|
作者
Boeddeker, Christoph [1 ,2 ]
Erdogan, Hakan [1 ]
Yoshioka, Takuya [1 ]
Haeb-Umbach, Reinhold [2 ]
机构
[1] Microsoft AI & Res, Redmond, WA 98052 USA
[2] Paderborn Univ, Dept Commun Engn, Paderborn, Germany
关键词
Far-field speech recognition; acoustic beamforming; neural networks; time-frequency masks; online processing;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This work examines acoustic beamformers employing neural networks (NNs) for mask prediction as front-end for automatic speech recognition (ASR) systems for practical scenarios like voice-enabled home devices. To test the versatility of the mask predicting network, the system is evaluated with different recording hardware, different microphone array designs, and different acoustic models of the downstream ASR system. Significant gains in recognition accuracy are obtained in all configurations despite the fact that the NN had been trained on mismatched data. Unlike previous work, the NN is trained on a feature level objective, which gives some performance advantage over a mask related criterion. Furthermore, different approaches for realizing online, or adaptive, NN-based beamforming are explored, where the online algorithms still show significant gains compared to the baseline performance.
引用
收藏
页码:6697 / 6701
页数:5
相关论文
共 50 条
  • [41] LOW-FREQUENCY COMPENSATED SYNTHETIC IMPULSE RESPONSES FOR IMPROVED FAR-FIELD SPEECH RECOGNITION
    Tang, Zhenyu
    Meng, Hsien-Yu
    Manocha, Dinesh
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 6974 - 6978
  • [42] 3-D ACOUSTIC MODELING FOR FAR-FIELD MULTI-CHANNEL SPEECH RECOGNITION
    Purushothaman, Anurenjan
    Sreeram, Anirudh
    Ganapathy, Sriram
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 6964 - 6968
  • [43] INTEGRATED ADAPTATION WITH MULTI-FACTOR JOINT-LEARNING FOR FAR-FIELD SPEECH RECOGNITION
    Qian, Yanmin
    Tan, Tian
    Yu, Dong
    Zhang, Yu
    2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 5770 - 5774
  • [44] 3-D CNN MODELS FOR FAR-FIELD MULTI-CHANNEL SPEECH RECOGNITION
    Ganapathy, Sriram
    Peddinti, Vijayaditya
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5499 - 5503
  • [45] Multiple fixed beamformers with a spacial Wiener-form postfilter for far-field speech recognition
    Sun, Sining
    Zhou, Shuran
    Hwang, Mei-Yuh
    Xie, Lei
    Li, Qin
    Lei, Xin
    2019 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2019, : 633 - 637
  • [46] Multi-objective Learning and Mask-based Post-processing for Deep Neural Network based Speech Enhancement
    Xu, Yong
    Du, Jun
    Huang, Zhen
    Dai, Li-Rong
    Lee, Chin-Hui
    16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 1508 - 1512
  • [47] Acoustic Model Combination Incorporated With Mask-Based Multi-Channel Source Separation for Automatic Speech Recognition
    Yoon, Jae Sam
    Park, Ji Hun
    Kim, Hong Kook
    Kim, Hoirin
    IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2010, 4 (05) : 772 - 784
  • [48] USING NEURAL NETWORK FRONT-ENDS ON FAR FIELD MULTIPLE MICROPHONES BASED SPEECH RECOGNITION
    Liu, Yulan
    Zhang, Pengyuan
    Hain, Thomas
    2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
  • [49] Multi-channel Opus compression for far-field automatic speech recognition with a fixed bitrate budget
    Drude, Lukas
    Heymann, Jahn
    Schwarz, Andreas
    Valin, Jean-Marc
    INTERSPEECH 2021, 2021, : 1669 - 1673
  • [50] Self-Attention Channel Combinator Frontend for End-to-End Multichannel Far-field Speech Recognition
    Gong, Rong
    Quillen, Carl
    Sharma, Dushyant
    Goderre, Andrew
    Lainez, Jose
    Milanovic, Ljubomir
    INTERSPEECH 2021, 2021, : 3840 - 3844