Permutation invariant training of deep models for speaker-independent multi-talker speech separation

被引：485

作者：

Takahashi, Kohei ^{[1
]}

Shiraishi, Toshihiko ^{[1
]}

机构：

[1] Yokohama Natl Univ, Grad Sch Environm & Informat Sci, 79-7 Tokiwadai,Hodogaya Ku, Yokohama, Kanagawa 2408501, Japan

来源：

MECHANICAL ENGINEERING JOURNAL | 2023年

关键词：

Deep encoder; Deep decoder; Deep learning; Speech separation; Time-domain;

D O I：

10.1109/ICASSP.2017.7952154

中图分类号：

TH [机械、仪表工业];

学科分类号：

0802 ;

摘要：

The previous research of speech separation has significantly improved separation performance based on the time-domain method: encoder, separator, and decoder. Most research has focused on revising the architecture of the separator. In contrast, a single 1-D convolution layer and 1-D transposed convolution layer have been used as encoder and decoder, respectively. This study proposes deep encoder and decoder architectures, consisting of stacked 1-D convolution layers, 1-D transposed convolution layers, or residual blocks, for the time-domain speech separation. The intentions of revising them are to improve separation performance and overcome the tradeoff between separation performance and computational cost due to their stride by enhancing their mapping ability. We applied them to Conv-TasNet, the typical model in the time-domain speech separation. Our results indicate that the better separation performance is archived as the number of their layers increases and that changing the number of their layers from 1 to 12 results in more than 1 dB improvement of SI-SDR on WSJ02mix. Additionally, it is suggested that the encoder and decoder should be deeper, corresponding to their stride since their task may be more difficult as the stride becomes larger. This study represents the importance of improving these architectures as well as separators.

引用

页数：13

共 50 条

[1] PERMUTATION INVARIANT TRAINING OF DEEP MODELS FOR SPEAKER-INDEPENDENT MULTI-TALKER SPEECH SEPARATION
Yul, Dang
Kalbcek, Marten
Tan, Zheng-Hua
Jensen, Jesper
[J]. 2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 241 - 245
[2] Multi-talker Speech Separation Based on Permutation Invariant Training and Beamforming
Yin, Lu
Wang, Ziteng
Xia, Risheng
Li, Junfeng
Yan, Yonghong
[J]. 19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 851 - 855
[3] Recognizing Multi-talker Speech with Permutation Invariant Training
Yu, Dong
Chang, Xuankai
Qian, Yanmin
[J]. 18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 2456 - 2460
[4] Speaker-Independent Audio-Visual Speech Separation Based on Transformer in Multi-Talker Environments
Wang, Jing
Luo, Yiyu
Yi, Weiming
Xie, Xiang
[J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2022, E105D (04) : 766 - 777
[5] Single-channel multi-talker speech recognition with permutation invariant training
Qian, Yanmin
Chang, Xuankai
Yu, Dong
[J]. SPEECH COMMUNICATION, 2018, 104 : 1 - 11
[6] PERMUTATION INVARIANT TRAINING FOR SPEAKER-INDEPENDENT MULTI-PITCH TRACKING
Liu, Yuzhou
Wang, DeLiang
[J]. 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5594 - 5598
[7] ADAPTIVE PERMUTATION INVARIANT TRAINING WITH AUXILIARY INFORMATION FOR MONAURAL MULTI-TALKER SPEECH RECOGNITION
Chang, Xuankai
Qian, Yanmin
Yu, Dong
[J]. 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5974 - 5978
[8] JOINT SEPARATION AND DENOISING OF NOISY MULTI-TALKER SPEECH USING RECURRENT NEURAL NETWORKS AND PERMUTATION INVARIANT TRAINING
Kolbaek, Morten
Yu, Dong
Tan, Zheng-Hua
Jensen, Jesper
[J]. 2017 IEEE 27TH INTERNATIONAL WORKSHOP ON MACHINE LEARNING FOR SIGNAL PROCESSING, 2017,
[9] Deep neural networks based binary classification for single channel speaker independent multi-talker speech separation
Saleem, Nasir
Khattak, Muhammad Irfan
[J]. APPLIED ACOUSTICS, 2020, 167
[10] KNOWLEDGE TRANSFER IN PERMUTATION INVARIANT TRAINING FOR SINGLE-CHANNEL MULTI-TALKER SPEECH RECOGNITION
Tan, Tian
Qian, Yanmin
Yu, Dong
[J]. 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5714 - 5718

← 1 2 3 4 5 →