Continuous Speech Separation with Ad Hoc Microphone Arrays

被引:0
|
作者
Wang, Dongmei [1 ]
Yoshioka, Takuya [1 ]
Chen, Zhuo [1 ]
Wang, Xiaofei [1 ]
Zhou, Tianyan [1 ]
Meng, Zhong [1 ]
机构
[1] Microsoft, Redmond, WA 98052 USA
关键词
ad hoc microphone array; speech separation; spatially distributed microphones; speaker counting;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Speech separation has been shown effective for multi-talker speech recognition. Under the ad hoc microphone array setup where the array consists of spatially distributed asynchronous microphones, additional challenges must be overcome as the geometry and number of microphones are unknown beforehand. Prior studies show, with a spatial-temporal-interleaving structure, neural networks can efficiently utilize the multi-channel signals of the ad hoc array. In this paper, we further extend this approach to continuous speech separation. Several techniques are introduced to enable speech separation for real continuous recordings. First, we apply a transformer-based network for spatio-temporal modeling of the ad hoc array signals. In addition, two methods are proposed to mitigate a speech duplication problem during single talker segments, which seems more severe in the ad hoc array scenarios. One method is device distortion simulation for reducing the acoustic mismatch between simulated training data and real recordings. The other is speaker counting to detect the single speaker segments and merge the output signal channels. Experimental results for AdHoc-LibiCSS, a new dataset consisting of continuous recordings of concatenated LibriSpeech utterances obtained by multiple different devices, show the proposed separation method can significantly improve the ASR accuracy for overlapped speech with little performance degradation for single talker segments.
引用
收藏
页码:1100 / 1104
页数:5
相关论文
共 50 条
  • [1] A Framework for Speech Enhancement With Ad Hoc Microphone Arrays
    Tavakoli, Vincent Mohammad
    Jensen, Jesper Rindom
    Christensen, Mads Graesboll
    Benesty, Jacob
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2016, 24 (06) : 1038 - 1051
  • [2] A PARTITIONED APPROACH TO SIGNAL SEPARATION WITH MICROPHONE AD HOC ARRAYS
    Tavakoli, Vincent Mohammad
    Jensen, Jesper Rindom
    Benesty, Jacob
    Christensen, Mads Graesboll
    [J]. 2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 3221 - 3225
  • [3] DISTRIBUTED MAX-SINR SPEECH ENHANCEMENT WITH AD HOC MICROPHONE ARRAYS
    Tavakoli, Vincent M.
    Jensen, Jesper R.
    Heusdens, Richard
    Benesty, Jacob
    Christensen, Mads G.
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 151 - 155
  • [4] Glottal Model Based Speech Beamforming for Ad-Hoc Microphone Arrays
    Zhang, Yang
    Florencio, Dinei
    Hasegawa-Johnson, Mark
    [J]. 18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 2675 - 2679
  • [5] COMMUNICATION-COST AWARE MICROPHONE SELECTION FOR NEURAL SPEECH ENHANCEMENT WITH AD-HOC MICROPHONE ARRAYS
    Casebeer, Jonah
    Kaikaus, Jamshed
    Smaragdis, Paris
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 8438 - 8442
  • [6] Sound Localization for Ad-Hoc Microphone Arrays
    Liaquat, Muhammad Usman
    Munawar, Hafiz Suliman
    Rahman, Amna
    Qadir, Zakria
    Kouzani, Abbas Z.
    Mahmud, M. A. Parvez
    [J]. ENERGIES, 2021, 14 (12)
  • [7] Scaling sparsemax based channel selection for speech recognition with ad-hoc microphone arrays
    Chen, Junqi
    Zhang, Xiao-Lei
    [J]. INTERSPEECH 2021, 2021, : 291 - 295
  • [8] PSEUDO-COHERENCE-BASED MVDR BEAMFORMER FOR SPEECH ENHANCEMENT WITH AD HOC MICROPHONE ARRAYS
    Tavakoli, Vincent Mohammad
    Jensen, Jesper Rindom
    Christensen, Mads Graesboll
    Benesty, Jacob
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 2659 - 2663
  • [9] NEAR-FIELD SOURCE EXTRACTION USING SPEECH PRESENCE PROBABILITIES FOR AD HOC MICROPHONE ARRAYS
    Taseska, Maja
    Markovich-Golan, Shmulik
    Habets, Emanuel A. P.
    Gannot, Sharon
    [J]. 2014 14TH INTERNATIONAL WORKSHOP ON ACOUSTIC SIGNAL ENHANCEMENT (IWAENC), 2014, : 169 - 173
  • [10] Detecting multiple, simultaneous talkers through localising speech recorded by ad-hoc microphone arrays
    Pasha, Shahab
    Ritz, Christian
    Zou, Y. X.
    [J]. 2016 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA), 2016,