EXPLORING PRACTICAL ASPECTS OF NEURAL MASK-BASED BEAMFORMING FOR FAR-FIELD SPEECH RECOGNITION

被引:0
|
作者
Boeddeker, Christoph [1 ,2 ]
Erdogan, Hakan [1 ]
Yoshioka, Takuya [1 ]
Haeb-Umbach, Reinhold [2 ]
机构
[1] Microsoft AI & Res, Redmond, WA 98052 USA
[2] Paderborn Univ, Dept Commun Engn, Paderborn, Germany
关键词
Far-field speech recognition; acoustic beamforming; neural networks; time-frequency masks; online processing;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This work examines acoustic beamformers employing neural networks (NNs) for mask prediction as front-end for automatic speech recognition (ASR) systems for practical scenarios like voice-enabled home devices. To test the versatility of the mask predicting network, the system is evaluated with different recording hardware, different microphone array designs, and different acoustic models of the downstream ASR system. Significant gains in recognition accuracy are obtained in all configurations despite the fact that the NN had been trained on mismatched data. Unlike previous work, the NN is trained on a feature level objective, which gives some performance advantage over a mask related criterion. Furthermore, different approaches for realizing online, or adaptive, NN-based beamforming are explored, where the online algorithms still show significant gains compared to the baseline performance.
引用
收藏
页码:6697 / 6701
页数:5
相关论文
共 50 条
  • [1] SPATIAL ATTENTION FOR FAR-FIELD SPEECH RECOGNITION WITH DEEP BEAMFORMING NEURAL NETWORKS
    He, Weipeng
    Lu, Lu
    Zhang, Biqiao
    Mahadeokar, Jay
    Kalgaonkar, Kaustubh
    Fuegen, Christian
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 7499 - 7503
  • [2] Mask-based Beamforming Using Complex-valued Neural Network for Recognition of Spatial Target Speech
    Hayakawa, Daichi
    Kagoshima, Takehiko
    Fujimura, Hiroshi
    2021 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2021, : 23 - 29
  • [3] Unsupervised training of neural mask-based beamforming
    Drude, Lukas
    Heymann, Jahn
    Haeb-Umbach, Reinhold
    INTERSPEECH 2019, 2019, : 1253 - 1257
  • [4] Beamforming Networks Using Spatial Covariance Features for Far-field Speech Recognition
    Xiao, Xiong
    Watanabe, Shinji
    Chng, Eng Siong
    Li, Haizhou
    2016 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA), 2016,
  • [5] End-to-End Far-Field Speech Recognition with Unified Dereverberation and Beamforming
    Zhang, Wangyou
    Subramanian, Aswin Shanmugam
    Chang, Xuankai
    Watanabe, Shinji
    Qian, Yanmin
    INTERSPEECH 2020, 2020, : 324 - 328
  • [6] DEREVERBERATION AND BEAMFORMING IN FAR-FIELD SPEAKER RECOGNITION
    Mosner, Ladislav
    Matejka, Pavel
    Novotny, Ondrej
    Cernocky, Jan
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5254 - 5258
  • [7] Far-Field Automatic Speech Recognition
    Haeb-Umbach, Reinhold
    Heymann, Jahn
    Drude, Lukas
    Watanabe, Shinji
    Delcroix, Marc
    Nakatani, Tomohiro
    PROCEEDINGS OF THE IEEE, 2021, 109 (02) : 124 - 148
  • [8] Far-field continuous speech recognition system based on speaker localization and sub-band beamforming
    Asaei, Afsaneh
    Taghizadeh, Mohammad Javad
    Sameti, Hossein
    2008 IEEE/ACS INTERNATIONAL CONFERENCE ON COMPUTER SYSTEMS AND APPLICATIONS, VOLS 1-3, 2008, : 495 - +
  • [9] Hilbert Envelope Based Features for Far-Field Speech Recognition
    Thomas, Samuel
    Ganapathy, Srirarn
    Hermansky, Hynek
    MACHINE LEARNING FOR MULTIMODAL INTERACTION, PROCEEDINGS, 2008, 5237 : 119 - +
  • [10] Dereverberation and Beamforming in Robust Far-Field Speaker Recognition
    Masner, Ladislav
    Plchot, Oldrich
    Matejka, Pavel
    Novotny, Ondrej
    Cernocky, Jan Honza
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 1334 - 1338