INTEGRATING DNN-BASED AND SPATIAL CLUSTERING-BASED MASK ESTIMATION FOR ROBUST MVDR BEAMFORMING

被引:0
|
作者
Nakatani, Tomohiro [1 ]
To, Nobutaka [1 ]
Higuchi, Takuya [1 ]
Araki, Shoko [1 ]
Kinoshita, Keisuke [1 ]
机构
[1] NTT Corp, NTT Commun Sci Labs, 2-4,Hikaridai, Kyoto 6190237, Japan
关键词
Beamforming; automatic speech recognition; time-frequency mask; deep neural network; spatial clustering; SEPARATION; CLASSIFICATION;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Recently, time-frequency mask-based beamforming has been extensively studied as the frontend of deep neural network (DNN) based automatic speech recognition (ASR) in noisy environments. Two mask estimation approaches have been separately developed for this beamforming method, namely the the DNN-based approach, which exploits the time-frequency features of the signal, and the spatial c1ustering-based approach, which exploits the spatial features ofthe signal. This paper proposes a new method that integrates the two approaches in a probabilistic way to further improve mask estimati on by exploiting the advantages of both approaches. Experiments using the real data ofthe CHiME-3 multichannel noisy speech corpus show that the proposed method almost always outperforms the conventional approaches in terms ofword error rate (WER) improvement.
引用
收藏
页码:286 / 290
页数:5
相关论文
共 50 条
  • [1] ONLINE INTEGRATION OF DNN-BASED AND SPATIAL CLUSTERING-BASED MASK ESTIMATION FOR ROBUST MVDR BEAMFORMING
    Matsui, Yutaro
    Nakatani, Tomohiro
    Delcroix, Marc
    Kinoshita, Keisuke
    Ito, Nobutaka
    Araki, Shoko
    Makino, Shoji
    [J]. 2018 16TH INTERNATIONAL WORKSHOP ON ACOUSTIC SIGNAL ENHANCEMENT (IWAENC), 2018, : 71 - 75
  • [2] DNN-BASED MASK ESTIMATION INTEGRATING SPECTRAL AND SPATIAL FEATURES FOR ROBUST BEAMFORMING
    Deng, Chengyun
    Song, Hui
    Zhang, Yi
    Sha, Yongtao
    Li, Xiangang
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 4647 - 4651
  • [3] DNN-BASED SPEECH MASK ESTIMATION FOR EIGENVECTOR BEAMFORMING
    Pfeifenberger, Lukas
    Zoehrer, Matthias
    Pernkopf, Franz
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 66 - 70
  • [4] ROBUST MASK ESTIMATION BY INTEGRATING NEURAL NETWORK-BASED AND CLUSTERING-BASED APPROACHES FOR ADAPTIVE ACOUSTIC BEAMFORMING
    Zhou, Ying
    Qian, Yanmin
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 536 - 540
  • [5] DNN-based Intelligent Beamforming on a Programmable Metasurface
    Li, Shangyang
    Fu, Shilei
    Xu, Feng
    [J]. Journal of Radars, 2021, 10 (02) : 259 - 266
  • [6] DNN-based speaker clustering for speaker diarisation
    Milner, Rosanna
    Hain, Thomas
    [J]. 17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 2185 - 2189
  • [7] DNN-BASED DISTRIBUTED MULTICHANNEL MASK ESTIMATION FOR SPEECH ENHANCEMENT IN MICROPHONE ARRAYS
    Furnon, Nicolas
    Serizel, Romain
    Illina, Irina
    Essid, Slim
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 4672 - 4676
  • [8] DNN-Based Mask Estimation for Distributed Speech Enhancement in Spatially Unconstrained Microphone Arrays
    Furnon, Nicolas
    Serizel, Romain
    Essid, Slim
    Illina, Irina
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2021, 29 : 2310 - 2323
  • [9] Comparative Study on DNN-based Minimum Variance Beamforming Robust to Small Movements of Sound Sources
    Saijo, Kohei
    Katagiri, Kazuhiro
    Fujieda, Masaru
    Kobayashi, Tetsunori
    Ogawa, Tetsuji
    [J]. 2021 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2021, : 603 - 607
  • [10] ON TIME-FREQUENCY MASK ESTIMATION FOR MVDR BEAMFORMING WITH APPLICATION IN ROBUST SPEECH RECOGNITION
    Xiao, Xiong
    Zhao, Shengkui
    Jones, Douglas L.
    Chng, Eng Siong
    Li, Haizhou
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 3246 - 3250