Bi-Sep: A Multi-Resolution Cross-Domain Monaural Speech Separation Framework

被引:0
|
作者
Ho, Kuan-Hsun [1 ]
Hung, Jeih-weih [2 ]
Chen, Berlin [1 ]
机构
[1] Natl Taiwan Normal Univ, Dept Comp Sci & Informat Engn, Taipei, Taiwan
[2] Natl Chi Nan Univ, Dept Elect Engn, Nantou, Taiwan
关键词
speech separation; reverberation; SepFormer; cross-domain; bi-projection fusion; multi-resolution; PHASE; FILTERBANK; NOISY;
D O I
10.1109/TAAI57707.2022.00022
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In recent years, deep neural network (DNN)-based time-domain methods for monaural speech separation have substantially improved under an anechoic condition. However, the performance of these methods degrades when facing harsher conditions, such as noise or reverberation. Although adopting Short-Time Fourier Transform (STFT) for feature extraction of these neural methods helps stabilize the performance in non-anechoic situations, it inherently loses the fine-grained vision, which is one of the particularities of time-domain methods. Therefore, this study explores incorporating time and STFT-domain features to retain their beneficial characteristics. Furthermore, we leverage a Bi-Projection Fusion (BPF) mechanism to merge the information between two domains. To evaluate the effectiveness of our proposed method, we conduct experiments in an anechoic setting on the WSJ0-2mix dataset and noisy/reverberant settings on WHAM!/WHAMR! dataset. The experiment shows that with a cost of ignorable degradation on anechoic dataset, the proposed method manages to promote the performance of existing neural models when facing more complicated environments.
引用
收藏
页码:72 / 77
页数:6
相关论文
共 21 条
  • [1] Multi-Resolution Convolutional Residual Neural Networks for Monaural Speech Dereverberation
    Zhao, Lei
    Zhu, Wenbo
    Li, Shengqiang
    Luo, Hong
    Zhang, Xiao-Lei
    Rahardja, Susanto
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 2338 - 2351
  • [2] Evolving Multi-Resolution Pooling CNN for Monaural Singing Voice Separation
    Yuan, Weitao
    Dong, Bofei
    Wang, Shengbei
    Unoki, Masashi
    Wang, Wenwu
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2021, 29 (29) : 807 - 822
  • [3] CMRFusion: A cross-domain multi-resolution fusion method for infrared and visible image fusion
    Xiong, Zhang
    Cao, Yuanjia
    Zhang, Xiaohui
    Hu, Qingping
    Han, Hongwei
    [J]. OPTICS AND LASERS IN ENGINEERING, 2023, 170
  • [4] Multi-Resolution Fully Convolutional Neural Networks for Monaural Audio Source Separation
    Grais, Emad M.
    Wierstorf, Hagen
    Ward, Dominic
    Plumbley, Mark D.
    [J]. LATENT VARIABLE ANALYSIS AND SIGNAL SEPARATION (LVA/ICA 2018), 2018, 10891 : 340 - 350
  • [5] CROSS-DOMAIN COOPERATIVE DEEP STACKING NETWORK FOR SPEECH SEPARATION
    Jiang, Wei
    Liang, Shan
    Dong, Like
    Yang, Hong
    Liu, Wenju
    Wang, Yunji
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 5083 - 5087
  • [6] Multi-resolution Stacking for Speech Separation Based on Boosted DNN
    Zhang, Xiao-Lei
    Wang, DeLiang
    [J]. 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 1745 - 1749
  • [7] Decoupling-style monaural speech enhancement with a triple-branch cross-domain fusion network
    Chen, Wenzhuo
    Yu, Runxiang
    Ye, Zhongfu
    [J]. APPLIED ACOUSTICS, 2024, 217
  • [8] Improved Speech Separation with Time-and-Frequency Cross-Domain Feature Selection
    Lan, Tian
    Qian, Yuxin
    Lyu, Yilan
    Mokhosi, Refuoe
    Tai, Wenxin
    Liu, Qiao
    [J]. INTERSPEECH 2021, 2021, : 3525 - 3529
  • [9] Improved Speech Separation with Time-and-Frequency Cross-domain Joint Embedding and Clustering
    Yang, Gene-Ping
    Tuan, Chao-, I
    Lee, Hung-Yi
    Lee, Lin-shan
    [J]. INTERSPEECH 2019, 2019, : 1363 - 1367
  • [10] A Multi-level Security Access Control Framework for Cross-Domain Networks
    Zhang, Hongbin
    Chang, Jiang
    Wang, Junshe
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE AND ENGINEERING (CSE) AND IEEE/IFIP INTERNATIONAL CONFERENCE ON EMBEDDED AND UBIQUITOUS COMPUTING (EUC), VOL 2, 2017, : 316 - 319