MONAURAL SOURCE SEPARATION: FROM ANECHOIC TO REVERBERANT ENVIRONMENTS

被引:9
|
作者
Cord-Landwehr, Tobias [1 ]
Boeddeker, Christoph [1 ]
Von Neumann, Thilo [1 ]
Zorila, Catalin [2 ]
Doddipatla, Rama [2 ]
Haeb-Umbach, Reinhold [1 ]
机构
[1] Paderborn Univ, Dept Commun Engn, Paderborn, Germany
[2] Toshiba Cambridge Res Lab, Cambridge, England
关键词
speech separation; deep learning; SepFormer; automatic speech recognition; reverberation; SPEECH SEPARATION;
D O I
10.1109/IWAENC53105.2022.9914794
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Impressive progress in neural network-based single-channel speech source separation has been made in recent years. But those improvements have been mostly reported on anechoic data, a situation that is hardly met in practice. Taking the SepFormer as a starting point, which achieves state-of-the-art performance on anechoic mixtures, we gradually modify it to optimize its performance on reverberant mixtures. Although this leads to a word error rate improvement by 7 percentage points compared to the standard SepFormer implementation, the system ends up with only marginally better performance than a PIT-BLSTM separation system, that is optimized with rather straightforward means. This is surprising and at the same time sobering, challenging the practical usefulness of many improvements reported in recent years for monaural source separation on nonreverberant data.
引用
收藏
页数:5
相关论文
共 50 条
  • [1] Two-Stage Monaural Source Separation in Reverberant Room Environments Using Deep Neural Networks
    Sun, Yang
    Wang, Wenwu
    Chambers, Jonathon
    Naqvi, Syed Mohsen
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2019, 27 (01) : 125 - 139
  • [2] ENHANCED TIME-FREQUENCY MASKING BY USING NEURAL NETWORKS FOR MONAURAL SOURCE SEPARATION IN REVERBERANT ROOM ENVIRONMENTS
    Sun, Yang
    Wang, Wenwu
    Chambers, Jonathon A.
    Naqvi, Syed Mohsen
    [J]. 2018 26TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2018, : 1647 - 1651
  • [3] A study on unsupervised monaural reverberant speech separation
    Hemavathi, R.
    Kumaraswamy, R.
    [J]. INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2020, 23 (02) : 451 - 457
  • [4] A study on unsupervised monaural reverberant speech separation
    R. Hemavathi
    R. Kumaraswamy
    [J]. International Journal of Speech Technology, 2020, 23 : 451 - 457
  • [5] Dynamic Precedence Effect Modeling for Source Separation in Reverberant Environments
    Hummersone, Christopher
    Mason, Russell
    Brookes, Tim
    [J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2010, 18 (07): : 1867 - 1871
  • [6] A Comparison of Computational Precedence Models for Source Separation in Reverberant Environments
    Hummersone, Christopher
    Mason, Russell
    Brookes, Tim
    [J]. JOURNAL OF THE AUDIO ENGINEERING SOCIETY, 2013, 61 (7-8): : 508 - 520
  • [7] Performance evaluation of blind source separation schemes in anechoic and echoic environments
    Woo, Sungmin
    Hong, Jeong
    [J]. PROCEEDINGS OF THE 7TH WSEAS INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING, ROBOTICS AND AUTOMATION: ADVANCED TOPICS ON SIGNAL PROCESSING, ROBOTICS AND AUTOMATION, 2008, : 251 - 255
  • [8] Infinite Sparse Factor Analysis for Blind Source Separation in Reverberant Environments
    Nagira, Kohei
    Otsuka, Takuma
    Okuno, Hiroshi G.
    [J]. STRUCTURAL, SYNTACTIC, AND STATISTICAL PATTERN RECOGNITION, 2012, 7626 : 638 - 647
  • [9] ANECHOIC PHASE ESTIMATION FROM REVERBERANT SIGNALS
    Belhomme, A.
    Grenier, Y.
    Badeau, R.
    Humbert, E.
    [J]. 2016 IEEE INTERNATIONAL WORKSHOP ON ACOUSTIC SIGNAL ENHANCEMENT (IWAENC), 2016,
  • [10] Gain Compensations for Cavities with Slot Apertures between Reverberant and Anechoic Environments
    Higgins, Matthew B.
    [J]. 2008 IEEE ANTENNAS AND PROPAGATION SOCIETY INTERNATIONAL SYMPOSIUM, VOLS 1-9, 2008, : 328 - 331