Reference Channel Selection by Multi-Channel Masking for End-to-End Multi-Channel Speech Enhancement

被引:0
|
作者
Dai, Wang [1 ]
Li, Xiaofei [2 ,3 ]
Politis, Archontis [1 ]
Virtanen, Tuomas [1 ]
机构
[1] Tampere Univ, Audio Res Grp, Tampere, Finland
[2] Westlake Univ, Hangzhou, Peoples R China
[3] Westlake Inst Adv Study, Hangzhou, Peoples R China
关键词
reference channel selection; multi-channel masking; end-to-end multi-channel speech enhancement;
D O I
10.23919/EUSIPCO63174.2024.10715275
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
In end-to-end multi-channel speech enhancement, the traditional approach of designating one microphone signal as the reference for processing may not always yield optimal results. The limitation is particularly in scenarios with large distributed microphone arrays with varying speaker-to-microphone distances or compact, highly directional microphone arrays where speaker or microphone positions change over time. Current mask-based methods often fix the reference channel during training, which makes it not possible to adaptively select the reference channel for optimal performance. To address this problem, we introduce an adaptive approach for selecting the optimal reference channel. Our method leverages a multi-channel masking-based scheme, where multiple masked signals are combined to generate a single-channel output signal. This enhanced signal is then used for loss calculation, while the reference clean speech is adjusted based on the highest scale-invariant signal-to-distortion ratio (SI-SDR). The experimental results on the Spear challenge simulated dataset D4 demonstrate the superiority of our proposed method over the conventional approach of using a fixed reference channel with single-channel masking.
引用
收藏
页码:241 / 245
页数:5
相关论文
共 50 条
  • [1] END-TO-END MULTI-CHANNEL TRANSFORMER FOR SPEECH RECOGNITION
    Chang, Feng-Ju
    Radfar, Martin
    Mouchtaris, Athanasios
    King, Brian
    Kunzmann, Siegfried
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 5884 - 5888
  • [2] Multi-channel Attention for End-to-End Speech Recognition
    Braun, Stefan
    Neil, Daniel
    Anumula, Jithendar
    Ceolini, Enea
    Liu, Shih-Chii
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 17 - 21
  • [3] An End-to-end Architecture of Online Multi-channel Speech Separation
    Wu, Jian
    Chen, Zhuo
    Li, Jinyu
    Yoshioka, Takuya
    Tan, Zhili
    Lin, Edward
    Luo, Yi
    Xie, Lei
    INTERSPEECH 2020, 2020, : 81 - 85
  • [4] Exploiting Single-Channel Speech for Multi-Channel End-to-End Speech Recognition: A Comparative Study
    An, Keyu
    Xiao, Ji
    Ou, Zhijian
    2022 13TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2022, : 180 - 184
  • [5] MIMO-SPEECH: END-TO-END MULTI-CHANNEL MULTI-SPEAKER SPEECH RECOGNITION
    Chang, Xuankai
    Zhang, Wangyou
    Qian, Yanmin
    Le Roux, Jonathan
    Watanabe, Shinji
    2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019), 2019, : 237 - 244
  • [6] END-TO-END MICROPHONE PERMUTATION AND NUMBER INVARIANT MULTI-CHANNEL SPEECH SEPARATION
    Luo, Yi
    Chen, Zhuo
    Mesgarani, Nima
    Yoshioka, Takuya
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 6394 - 6398
  • [7] ON END-TO-END MULTI-CHANNEL TIME DOMAIN SPEECH SEPARATION IN REVERBERANT ENVIRONMENTS
    Zhang, Jisi
    Zorila, Catalin
    Doddipatla, Rama
    Barker, Jon
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 6389 - 6393
  • [8] End-to-End Multi-Channel Speech Enhancement Using Inter-Channel Time-Restricted Attention on Raw Waveform
    Lee, Hyeonseung
    Kim, Hyung Yong
    Kang, Woo Hyun
    Kim, Jeunghun
    Kim, Nam Soo
    INTERSPEECH 2019, 2019, : 4285 - 4289
  • [9] MULTI-CHANNEL END-TO-END NEURAL DIARIZATION WITH DISTRIBUTED MICROPHONES
    Horiguchi, Shota
    Takashima, Yuki
    Garcia, Paola
    Watanabe, Shinji
    Kawaguchi, Yohei
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 7332 - 7336
  • [10] ENHANCING END-TO-END MULTI-CHANNEL SPEECH SEPARATION VIA SPATIAL FEATURE LEARNING
    Gu, Rongzhi
    Zhang, Shi-Xiong
    Chen, Lianwu
    Xu, Yong
    Yu, Meng
    Su, Dan
    Zou, Yuexian
    Yu, Dong
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 7319 - 7323