Multichannel Loss Function for Supervised Speech Source Separation by Mask-based Beamforming

被引:3
|
作者
Masuyama, Yoshiki [1 ,2 ]
Togami, Masahito [2 ]
Komatsu, Tatsuya [2 ]
机构
[1] Waseda Univ, Dept Intermedia Art & Sci, Tokyo, Japan
[2] LINE Corpolat, Tokyo, Japan
来源
关键词
Speaker-independent multi-talker separation; neural beamformer; multichannel Italura-Saito divergence;
D O I
10.21437/Interspeech.2019-1289
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
In this paper, we propose two mask-based beamforming methods using a deep neural network (DNN) trained by multichannel loss functions. Beamforming technique using time-frequency (TF)-masks estimated by a DNN have been applied to many applications where TF-masks are used for estimating spatial covariance matrices. To train a DNN for mask-based beamforming, loss functions designed for monaural speech enhancement/separation have been employed. Although such a training criterion is simple, it does not directly correspond to the performance of mask-based beamforming. To overcome this problem, we use multichannel loss functions which evaluate the estimated spatial covariance matrices based on the multichannel Itakura-Saito divergence. DNNs trained by the multichannel loss functions can be applied to construct several beamformers. Experimental results confirmed their effectiveness and robustness to microphone configurations.
引用
收藏
页码:2708 / 2712
页数:5
相关论文
共 50 条
  • [31] ON SPATIAL FEATURES FOR SUPERVISED SPEECH SEPARATION AND ITS APPLICATION TO BEAMFORMING AND ROBUST ASR
    Wang, Zhong-Qiu
    Wang, DeLiang
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5709 - 5713
  • [32] FRAME-BY-FRAME CLOSED-FORM UPDATE FOR MASK-BASED ADAPTIVE MVDR BEAMFORMING
    Higuchi, Takuya
    Kinoshita, Keisuke
    Ito, Nobutaka
    Karita, Shigeki
    Nakatani, Tomohiro
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 531 - 535
  • [33] Partially RepRapable automated open source bag valve mask-based ventilator
    Petsiuk, Aliaksei
    Tanikella, Nagendra G.
    Dertinger, Samantha
    Pringle, Adam
    Oberloier, Shane
    Pearce, Joshua M.
    HARDWAREX, 2020, 8 (08):
  • [34] Beamforming-based Speech Enhancement based on Optimal Ratio Mask
    Ji, Qiang
    Bao, Changchun
    Cheng, Rui
    CONFERENCE PROCEEDINGS OF 2019 IEEE INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING, COMMUNICATIONS AND COMPUTING (IEEE ICSPCC 2019), 2019,
  • [35] LINEAR MULTICHANNEL BLIND SOURCE SEPARATION BASED ON TIME-FREQUENCY MASK OBTAINED BY HARMONIC/PERCUSSIVE SOUND SEPARATION
    Oyabu, Soichiro
    Kitamura, Daichi
    Yatabe, Kohei
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 201 - 205
  • [36] DNN-BASED SPEECH MASK ESTIMATION FOR EIGENVECTOR BEAMFORMING
    Pfeifenberger, Lukas
    Zoehrer, Matthias
    Pernkopf, Franz
    2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 66 - 70
  • [37] Investigation of Cost Function for Supervised Monaural Speech Separation
    Liu, Yun
    Zhang, Hui
    Zhang, Xueliang
    Cao, Yuhang
    INTERSPEECH 2019, 2019, : 3178 - 3182
  • [38] A beamforming approach to permutation alignment for multichannel frequency-domain blind speech separation
    Ikram, MZ
    Morgan, DR
    2002 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-IV, PROCEEDINGS, 2002, : 881 - 884
  • [39] Using Shifted Real Spectrum Mask as Training Target for Supervised Speech Separation
    Liu, Yun
    Zhang, Hui
    Zhang, Xueliang
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 1151 - 1155
  • [40] UNSUPERVISED TRAINING FOR DEEP SPEECH SOURCE SEPARATION WITH KULLBACK-LEIBLER DIVERGENCE BASED PROBABILISTIC LOSS FUNCTION
    Togami, Masahito
    Masuyama, Yoshiki
    Komatsu, Tatsuya
    Nakagome, Yu
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 56 - 60