Multichannel Loss Function for Supervised Speech Source Separation by Mask-based Beamforming

被引:3
|
作者
Masuyama, Yoshiki [1 ,2 ]
Togami, Masahito [2 ]
Komatsu, Tatsuya [2 ]
机构
[1] Waseda Univ, Dept Intermedia Art & Sci, Tokyo, Japan
[2] LINE Corpolat, Tokyo, Japan
来源
关键词
Speaker-independent multi-talker separation; neural beamformer; multichannel Italura-Saito divergence;
D O I
10.21437/Interspeech.2019-1289
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
In this paper, we propose two mask-based beamforming methods using a deep neural network (DNN) trained by multichannel loss functions. Beamforming technique using time-frequency (TF)-masks estimated by a DNN have been applied to many applications where TF-masks are used for estimating spatial covariance matrices. To train a DNN for mask-based beamforming, loss functions designed for monaural speech enhancement/separation have been employed. Although such a training criterion is simple, it does not directly correspond to the performance of mask-based beamforming. To overcome this problem, we use multichannel loss functions which evaluate the estimated spatial covariance matrices based on the multichannel Itakura-Saito divergence. DNNs trained by the multichannel loss functions can be applied to construct several beamformers. Experimental results confirmed their effectiveness and robustness to microphone configurations.
引用
收藏
页码:2708 / 2712
页数:5
相关论文
共 50 条
  • [1] Mask-based blind source separation and MVDR beamforming in ASR
    He, Renke
    Long, Yanhua
    Li, Yijie
    Liang, Jiaen
    INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2020, 23 (01) : 133 - 140
  • [2] Mask-based blind source separation and MVDR beamforming in ASR
    Renke He
    Yanhua Long
    Yijie Li
    Jiaen Liang
    International Journal of Speech Technology, 2020, 23 : 133 - 140
  • [3] Improvement of Mask-Based Speech Source Separation Using DNN
    Zhan, Ge
    Huang, Zhaoqiong
    Ying, Dongwen
    Pan, Jielin
    Yan, Yonghong
    2016 10TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2016,
  • [4] DNN-SUPPORTED MASK-BASED CONVOLUTIONAL BEAMFORMING FOR SIMULTANEOUS DENOISING, DEREVERBERATION, AND SOURCE SEPARATION
    Nakatani, Tomohiro
    Takahashi, Riki
    Ochiai, Tsubasa
    Kinoshita, Keisuke
    Ikeshita, Rintaro
    Delcroix, Marc
    Araki, Shoko
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 6399 - 6403
  • [5] Components loss for neural networks in mask-based speech enhancement
    Ziyi Xu
    Samy Elshamy
    Ziyue Zhao
    Tim Fingscheidt
    EURASIP Journal on Audio, Speech, and Music Processing, 2021
  • [6] Components loss for neural networks in mask-based speech enhancement
    Xu, Ziyi
    Elshamy, Samy
    Zhao, Ziyue
    Fingscheidt, Tim
    EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2021, 2021 (01)
  • [7] Unsupervised training of neural mask-based beamforming
    Drude, Lukas
    Heymann, Jahn
    Haeb-Umbach, Reinhold
    INTERSPEECH 2019, 2019, : 1253 - 1257
  • [8] Acoustic Model Combination Incorporated With Mask-Based Multi-Channel Source Separation for Automatic Speech Recognition
    Yoon, Jae Sam
    Park, Ji Hun
    Kim, Hong Kook
    Kim, Hoirin
    IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2010, 4 (05) : 772 - 784
  • [9] EXPLORING PRACTICAL ASPECTS OF NEURAL MASK-BASED BEAMFORMING FOR FAR-FIELD SPEECH RECOGNITION
    Boeddeker, Christoph
    Erdogan, Hakan
    Yoshioka, Takuya
    Haeb-Umbach, Reinhold
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 6697 - 6701
  • [10] Supervised speech separation combined with adaptive beamforming
    Saric, Zoran
    Subotic, Misko
    Bilibajkic, Ruzica
    Barjaktarovic, Marko
    Stojanovic, Jasmina
    COMPUTER SPEECH AND LANGUAGE, 2022, 76