MONAURAL SOURCE SEPARATION: FROM ANECHOIC TO REVERBERANT ENVIRONMENTS

被引:9
|
作者
Cord-Landwehr, Tobias [1 ]
Boeddeker, Christoph [1 ]
Von Neumann, Thilo [1 ]
Zorila, Catalin [2 ]
Doddipatla, Rama [2 ]
Haeb-Umbach, Reinhold [1 ]
机构
[1] Paderborn Univ, Dept Commun Engn, Paderborn, Germany
[2] Toshiba Cambridge Res Lab, Cambridge, England
关键词
speech separation; deep learning; SepFormer; automatic speech recognition; reverberation; SPEECH SEPARATION;
D O I
10.1109/IWAENC53105.2022.9914794
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Impressive progress in neural network-based single-channel speech source separation has been made in recent years. But those improvements have been mostly reported on anechoic data, a situation that is hardly met in practice. Taking the SepFormer as a starting point, which achieves state-of-the-art performance on anechoic mixtures, we gradually modify it to optimize its performance on reverberant mixtures. Although this leads to a word error rate improvement by 7 percentage points compared to the standard SepFormer implementation, the system ends up with only marginally better performance than a PIT-BLSTM separation system, that is optimized with rather straightforward means. This is surprising and at the same time sobering, challenging the practical usefulness of many improvements reported in recent years for monaural source separation on nonreverberant data.
引用
收藏
页数:5
相关论文
共 50 条
  • [41] Monaural Source Separation Using a Random Forest Classifier
    Riday, Cosimo
    Bhargava, Saurabh
    Hahnloser, Richard H. R.
    Liu, Shih-Chii
    [J]. 17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 3344 - 3348
  • [42] Monaural Source Separation Using Ramanujan Subspace Dictionaries
    Liao, Hsueh-Wei
    Su, Li
    [J]. IEEE SIGNAL PROCESSING LETTERS, 2018, 25 (08) : 1156 - 1160
  • [43] Mask Optimisation for Neural Network Monaural Source Separation
    Cant, Richard
    Langensiepen, Caroline
    Metcalf, William
    [J]. 2017 19TH UKSIM-AMSS INTERNATIONAL CONFERENCE ON MATHEMATICAL MODELLING & COMPUTER SIMULATION (UKSIM), 2017, : 116 - 121
  • [44] Monaural Audio Source Separation using Variational Autoencoders
    Pandey, Laxmi
    Kumar, Anurendra
    Namboodiri, Vinay
    [J]. 19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 3489 - 3493
  • [45] Cochannel Speaker Identification in Anechoic and Reverberant Conditions
    Zhao, Xiaojia
    Wang, Yuxuan
    Wang, DeLiang
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2015, 23 (11) : 1727 - 1736
  • [46] A Partitioned Frequency Block Algorithm for Blind Separation in Reverberant Environments
    Scarpiniti, Michele
    Picaro, Andrea
    Parisi, Raffaele
    Uncini, Aurelio
    [J]. NEURAL NETS WIRN09, 2009, 204 : 81 - 90
  • [47] Blind speech separation of moving spearers in real reverberant environments
    Koutras, A
    Dermatas, E
    Kokkinakis, G
    [J]. 2000 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, PROCEEDINGS, VOLS I-VI, 2000, : 1133 - 1136
  • [48] RECURRENT NEURAL NETWORKS FOR COCHANNEL SPEECH SEPARATION IN REVERBERANT ENVIRONMENTS
    Delfarah, Masood
    Wang, DeLiang
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5404 - 5408
  • [49] Deep Learning Based Binaural Speech Separation in Reverberant Environments
    Zhang, Xueliang
    Wang, DeLiang
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2017, 25 (05) : 1075 - 1084
  • [50] Monaural speech/music source separation using discrete energy separation algorithm
    Litvin, Yevgeni
    Cohen, Israel
    Chazan, Dan
    [J]. SIGNAL PROCESSING, 2010, 90 (12) : 3147 - 3163