Permutation invariant training of deep models for speaker-independent multi-talker speech separation

被引:485
|
作者
Takahashi, Kohei [1 ]
Shiraishi, Toshihiko [1 ]
机构
[1] Yokohama Natl Univ, Grad Sch Environm & Informat Sci, 79-7 Tokiwadai,Hodogaya Ku, Yokohama, Kanagawa 2408501, Japan
关键词
Deep encoder; Deep decoder; Deep learning; Speech separation; Time-domain;
D O I
10.1109/ICASSP.2017.7952154
中图分类号
TH [机械、仪表工业];
学科分类号
0802 ;
摘要
The previous research of speech separation has significantly improved separation performance based on the time-domain method: encoder, separator, and decoder. Most research has focused on revising the architecture of the separator. In contrast, a single 1-D convolution layer and 1-D transposed convolution layer have been used as encoder and decoder, respectively. This study proposes deep encoder and decoder architectures, consisting of stacked 1-D convolution layers, 1-D transposed convolution layers, or residual blocks, for the time-domain speech separation. The intentions of revising them are to improve separation performance and overcome the tradeoff between separation performance and computational cost due to their stride by enhancing their mapping ability. We applied them to Conv-TasNet, the typical model in the time-domain speech separation. Our results indicate that the better separation performance is archived as the number of their layers increases and that changing the number of their layers from 1 to 12 results in more than 1 dB improvement of SI-SDR on WSJ02mix. Additionally, it is suggested that the encoder and decoder should be deeper, corresponding to their stride since their task may be more difficult as the stride becomes larger. This study represents the importance of improving these architectures as well as separators.
引用
收藏
页数:13
相关论文
共 50 条
  • [1] PERMUTATION INVARIANT TRAINING OF DEEP MODELS FOR SPEAKER-INDEPENDENT MULTI-TALKER SPEECH SEPARATION
    Yul, Dang
    Kalbcek, Marten
    Tan, Zheng-Hua
    Jensen, Jesper
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 241 - 245
  • [2] Multi-talker Speech Separation Based on Permutation Invariant Training and Beamforming
    Yin, Lu
    Wang, Ziteng
    Xia, Risheng
    Li, Junfeng
    Yan, Yonghong
    [J]. 19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 851 - 855
  • [3] Recognizing Multi-talker Speech with Permutation Invariant Training
    Yu, Dong
    Chang, Xuankai
    Qian, Yanmin
    [J]. 18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 2456 - 2460
  • [4] Speaker-Independent Audio-Visual Speech Separation Based on Transformer in Multi-Talker Environments
    Wang, Jing
    Luo, Yiyu
    Yi, Weiming
    Xie, Xiang
    [J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2022, E105D (04) : 766 - 777
  • [5] Single-channel multi-talker speech recognition with permutation invariant training
    Qian, Yanmin
    Chang, Xuankai
    Yu, Dong
    [J]. SPEECH COMMUNICATION, 2018, 104 : 1 - 11
  • [6] PERMUTATION INVARIANT TRAINING FOR SPEAKER-INDEPENDENT MULTI-PITCH TRACKING
    Liu, Yuzhou
    Wang, DeLiang
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5594 - 5598
  • [7] ADAPTIVE PERMUTATION INVARIANT TRAINING WITH AUXILIARY INFORMATION FOR MONAURAL MULTI-TALKER SPEECH RECOGNITION
    Chang, Xuankai
    Qian, Yanmin
    Yu, Dong
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5974 - 5978
  • [8] JOINT SEPARATION AND DENOISING OF NOISY MULTI-TALKER SPEECH USING RECURRENT NEURAL NETWORKS AND PERMUTATION INVARIANT TRAINING
    Kolbaek, Morten
    Yu, Dong
    Tan, Zheng-Hua
    Jensen, Jesper
    [J]. 2017 IEEE 27TH INTERNATIONAL WORKSHOP ON MACHINE LEARNING FOR SIGNAL PROCESSING, 2017,
  • [9] Deep neural networks based binary classification for single channel speaker independent multi-talker speech separation
    Saleem, Nasir
    Khattak, Muhammad Irfan
    [J]. APPLIED ACOUSTICS, 2020, 167
  • [10] KNOWLEDGE TRANSFER IN PERMUTATION INVARIANT TRAINING FOR SINGLE-CHANNEL MULTI-TALKER SPEECH RECOGNITION
    Tan, Tian
    Qian, Yanmin
    Yu, Dong
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5714 - 5718