Multichannel Loss Function for Supervised Speech Source Separation by Mask-based Beamforming

被引:3
|
作者
Masuyama, Yoshiki [1 ,2 ]
Togami, Masahito [2 ]
Komatsu, Tatsuya [2 ]
机构
[1] Waseda Univ, Dept Intermedia Art & Sci, Tokyo, Japan
[2] LINE Corpolat, Tokyo, Japan
来源
关键词
Speaker-independent multi-talker separation; neural beamformer; multichannel Italura-Saito divergence;
D O I
10.21437/Interspeech.2019-1289
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
In this paper, we propose two mask-based beamforming methods using a deep neural network (DNN) trained by multichannel loss functions. Beamforming technique using time-frequency (TF)-masks estimated by a DNN have been applied to many applications where TF-masks are used for estimating spatial covariance matrices. To train a DNN for mask-based beamforming, loss functions designed for monaural speech enhancement/separation have been employed. Although such a training criterion is simple, it does not directly correspond to the performance of mask-based beamforming. To overcome this problem, we use multichannel loss functions which evaluate the estimated spatial covariance matrices based on the multichannel Itakura-Saito divergence. DNNs trained by the multichannel loss functions can be applied to construct several beamformers. Experimental results confirmed their effectiveness and robustness to microphone configurations.
引用
收藏
页码:2708 / 2712
页数:5
相关论文
共 50 条
  • [21] ENHANCEMENT OF CODED SPEECH USING A MASK-BASED POST-FILTER
    Korse, Srikanth
    Gupta, Kishan
    Fuchs, Guillaume
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 6764 - 6768
  • [22] Student-Teacher Learning for BLSTM Mask-based Speech Enhancement
    Subramanian, Aswin Shanmugam
    Chen, Szu-Jui
    Watanabe, Shinji
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 3249 - 3253
  • [23] A multichannel beamforming-based framework for speech extraction
    Hidri, Adel
    Amiri, Hamid
    INTERNATIONAL JOURNAL OF INTELLIGENT ENGINEERING INFORMATICS, 2015, 3 (2-3) : 273 - 291
  • [24] A Multichannel MMSE-Based Framework for Speech Source Separation and Noise Reduction
    Souden, Mehrez
    Araki, Shoko
    Kinoshita, Keisuke
    Nakatani, Tomohiro
    Sawada, Hiroshi
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2013, 21 (09): : 1913 - 1928
  • [25] A soft masking strategy based on multichannel speech probability estimation for source separation and robust speech recognition
    Hoffmann, Eugen
    Kolossa, Dorothea
    Orglmeister, Reinhold
    2007 IEEE WORKSHOP ON APPLICATIONS OF SIGNAL PROCESSING TO AUDIO AND ACOUSTICS, 2007, : 117 - 120
  • [26] Using Optimal Ratio Mask as Training Target for Supervised Speech Separation
    Xia, Shasha
    Li, Hao
    Zhang, Xueliang
    2017 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC 2017), 2017, : 163 - 166
  • [27] Speech enhancement and source separation supported by negative Beamforming Filtering
    Alvarez, A
    Gómez, P
    Nieto, V
    Martínez, R
    Rodellar, V
    2002 6TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING PROCEEDINGS, VOLS I AND II, 2002, : 342 - 345
  • [28] A NEW MASK-BASED OBJECTIVE MEASURE FOR PREDICTING THE INTELLIGIBILITY OF BINARY MASKED SPEECH
    Yu, Chengzhu
    Wojcicki, Kamil K.
    Loizou, P. C.
    Hansen, John H. L.
    2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 7030 - 7033
  • [29] Multichannel speech separation using adaptive parameterization of source PDFs
    Kokkinakis, K
    Nandi, AK
    INDEPENDENT COMPONENT ANALYSIS AND BLIND SIGNAL SEPARATION, 2004, 3195 : 486 - 493
  • [30] Multichannel blind deconvolution for source separation in convolutive mixtures of speech
    Kokkinakis, K
    Nandi, AK
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2006, 14 (01): : 200 - 212