FurcaNeXt: End-to-End Monaural Speech Separation with Dynamic Gated Dilated Temporal Convolutional Networks

被引:49
|
作者
Zhang, Liwen [1 ]
Shi, Ziqiang [2 ]
Han, Jiqing [1 ]
Shi, Anyan [3 ]
Ma, Ding [1 ]
机构
[1] Harbin Inst Technol, Sch Comp Sci & Technol, Harbin 150001, Peoples R China
[2] Fujitsu Res & Dev Ctr, Beijing 100027, Peoples R China
[3] Shuangfeng First, Beijing, Peoples R China
来源
关键词
Speech separation; Cocktail party problem; Sequence modeling; Temporal convolutional networks; Permutation invariant training; PERFORMANCE;
D O I
10.1007/978-3-030-37731-1_53
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Deep dilated temporal convolutional networks (TCN) have been proved to be very effective in sequence modeling. In this paper we propose several improvements of TCN for end-to-end approach to monaural speech separation, which consists of (1) multi-scale dynamic weighted gated TCN with a pyramidal structure (FurcaPy), (2) gated TCN with intra-parallel convolutional components (FurcaPa), (3) weight-shared multi-scale gated TCN (FurcaSh) and (4) dilated TCN with gated subtractive-convolutional component (FurcaSu). All these networks take the mixed utterance of two speakers and maps it to two separated utterances, where each utterance contains only one speaker's voice. For the objective, we propose to train the networks by directly optimizing utterance-level signal-to-distortion ratio (SDR) in a permutation invariant training (PIT) style. Our experiments on the public WSJ0-2mix data corpus result in 18.4 dB SDR improvement, which shows our proposed networks can lead to performance improvement on the speaker separation task.
引用
收藏
页码:653 / 665
页数:13
相关论文
共 50 条
  • [1] Deep Attention Gated Dilated Temporal Convolutional Networks with Intra-Parallel Convolutional Modules for End-to-End Monaural Speech Separation
    Shi, Ziqiang
    Lin, Huibin
    Liu, Liu
    Liu, Rujie
    Han, Jiqing
    Shi, Anyan
    [J]. INTERSPEECH 2019, 2019, : 3183 - 3187
  • [2] End-to-End Monaural Speech Separation with Multi-Scale Dynamic Weighted Gated Dilated Convolutional Pyramid Network
    Shi, Ziqiang
    Lin, Huibin
    Liu, Liu
    Liu, Rujie
    Hayakawa, Shoji
    Harada, Shouji
    Han, Jiqing
    [J]. INTERSPEECH 2019, 2019, : 4614 - 4618
  • [3] FURCAX: END-TO-END MONAURAL SPEECH SEPARATION BASED ON DEEP GATED (DE)CONVOLUTIONAL NEURAL NETWORKS WITH ADVERSARIAL EXAMPLE TRAINING
    Shi, Ziqiang
    Lin, Huibin
    Liu, Liu
    Liu, Rujie
    Hayakawa, Shoji
    Han, Jiqing
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6985 - 6989
  • [4] End-to-End Monaural Speech Separation with a Deep Complex U-Shaped Network
    Zhang, Wen
    Li, Xiaoyong
    Zhou, Aolong
    Deng, Kefeng
    Ren, Kaijun
    Song, Junqiang
    [J]. JOURNAL OF CIRCUITS SYSTEMS AND COMPUTERS, 2022, 31 (02)
  • [5] VERY DEEP CONVOLUTIONAL NETWORKS FOR END-TO-END SPEECH RECOGNITION
    Zhang, Yu
    Chan, William
    Jaitly, Navdeep
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 4845 - 4849
  • [6] Quaternion Convolutional Neural Networks for End-to-End Automatic Speech Recognition
    Parcollet, Titouan
    Zhang, Ying
    Morchid, Mohamed
    Trabelsi, Chiheb
    Linares, Georges
    De Mori, Renato
    Bengio, Yoshua
    [J]. 19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 22 - 26
  • [7] Towards End-to-End Speech Recognition with Deep Convolutional Neural Networks
    Zhang, Ying
    Pezeshki, Mohammad
    Brakel, Philemon
    Zhang, Saizheng
    Laurent, Cesar
    Bengio, Yoshua
    Courville, Aaron
    [J]. 17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 410 - 414
  • [8] Gated End-to-End Memory Networks
    Liu, Fei
    Perez, Julien
    [J]. 15TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EACL 2017), VOL 1: LONG PAPERS, 2017, : 1 - 10
  • [9] Towards End-to-End Speech Recognition with Deep Multipath Convolutional Neural Networks
    Zhang, Wei
    Zhai, Minghao
    Huang, Zilong
    Liu, Chen
    Li, Wei
    Cao, Yi
    [J]. INTELLIGENT ROBOTICS AND APPLICATIONS, ICIRA 2019, PART VI, 2019, 11745 : 332 - 341
  • [10] Gated Residual Networks With Dilated Convolutions for Monaural Speech Enhancement
    Tan, Ke
    Chen, Jitong
    Wang, DeLiang
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2019, 27 (01) : 189 - 198