EXPLORING PRACTICAL ASPECTS OF NEURAL MASK-BASED BEAMFORMING FOR FAR-FIELD SPEECH RECOGNITION

被引:0
|
作者
Boeddeker, Christoph [1 ,2 ]
Erdogan, Hakan [1 ]
Yoshioka, Takuya [1 ]
Haeb-Umbach, Reinhold [2 ]
机构
[1] Microsoft AI & Res, Redmond, WA 98052 USA
[2] Paderborn Univ, Dept Commun Engn, Paderborn, Germany
关键词
Far-field speech recognition; acoustic beamforming; neural networks; time-frequency masks; online processing;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This work examines acoustic beamformers employing neural networks (NNs) for mask prediction as front-end for automatic speech recognition (ASR) systems for practical scenarios like voice-enabled home devices. To test the versatility of the mask predicting network, the system is evaluated with different recording hardware, different microphone array designs, and different acoustic models of the downstream ASR system. Significant gains in recognition accuracy are obtained in all configurations despite the fact that the NN had been trained on mismatched data. Unlike previous work, the NN is trained on a feature level objective, which gives some performance advantage over a mask related criterion. Furthermore, different approaches for realizing online, or adaptive, NN-based beamforming are explored, where the online algorithms still show significant gains compared to the baseline performance.
引用
收藏
页码:6697 / 6701
页数:5
相关论文
共 50 条
  • [31] Multichannel spatial clustering for robust far-field automatic speech recognition in mismatched conditions
    Mandel, Michael I.
    Barker, Jon P.
    17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 1991 - 1995
  • [32] The LeVoice Far-field Speech Recognition System for VOiCES from a Distance Challenge 2019
    Liang, Yulong
    Yang, Lin
    Wang, Xuyang
    Li, Yingjie
    Jia, Chen
    Wang, Junjie
    INTERSPEECH 2019, 2019, : 2483 - 2487
  • [33] IR-GAN: Room impulse response generator for far-field speech recognition
    Ratnarajah, Anton
    Tang, Zhenyu
    Manocha, Dinesh
    INTERSPEECH 2021, 2021, : 286 - 290
  • [34] Teager Energy Subband Filtered Features for Near and Far-Field Automatic Speech Recognition
    Kamble, Madhu R.
    Nayak, Shekhar
    Shaik, M. Ali Basha
    Rath, Shakti P.
    Vij, Vikram
    Patil, Hemant A.
    2021 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2021, : 491 - 496
  • [35] FAR-FIELD SPEECH RECOGNITION USING CNN-DNN-HMM WITH CONVOLUTION IN TIME
    Yoshioka, Takuya
    Karita, Shigeki
    Nakatani, Tomohiro
    2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 4360 - 4364
  • [36] Kernel Machines Beat Deep Neural Networks on Mask-based Single-channel Speech Enhancement
    Hui, Like
    Ma, Siyuan
    Belkin, Mikhail
    INTERSPEECH 2019, 2019, : 2748 - 2752
  • [37] A MASK-BASED POST PROCESSING APPROACH FOR IMPROVING THE QUALITY AND INTELLIGIBILITY OF DEEP NEURAL NETWORK ENHANCED SPEECH
    Odelowo, Babafemi O.
    Anderson, David V.
    2017 16TH IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA), 2017, : 1134 - 1138
  • [38] Application of Fusion of Various Spontaneous Speech Analytics Methods for Improving Far-Field Neural-Based Diarization
    Astapov, Sergei
    Gusev, Aleksei
    Volkova, Marina
    Logunov, Aleksei
    Zaluskaia, Valeriia
    Kapranova, Vlada
    Timofeeva, Elena
    Evseeva, Elena
    Kabarov, Vladimir
    Matveev, Yuri
    MATHEMATICS, 2021, 9 (23)
  • [39] FAR-FIELD LOCATION GUIDED TARGET SPEECH EXTRACTION USING END-TO-END SPEECH RECOGNITION OBJECTIVES
    Subramanian, Aswin Shanmugam
    Weng, Chao
    Yu, Meng
    Zhang, Shi-Xiong
    Xu, Yong
    Watanabe, Shinji
    Yu, Dong
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 7299 - 7303
  • [40] Clear imaging method based on far-field beamforming for substation noise source identification
    Li L.
    Chu Z.
    Zhao Y.
    Li L.
    Noise and Vibration Worldwide, 2023, 54 (10-11): : 587 - 594