SOURCE-AWARE CONTEXT NETWORK FOR SINGLE-CHANNEL MULTI-SPEAKER SPEECH SEPARATION

被引：0

作者：

Li, Zeng-Xi ^{[1
]}

Song, Yan ^{[1
]}

Dai, Li-Rong ^{[1
]}

McLoughlin, Ian ^{[2
]}

机构：

[1] Univ Sci & Technol China, Natl Engn Lab Speech & Language Informat Proc, Hefei, Anhui, Peoples R China

[2] Univ Kent, Sch Comp, Medway, England

来源：

2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2018年

基金：

中国国家自然科学基金;

关键词：

Speech Separation; Deep Learning; Label Permutation Problem; NEURAL-NETWORKS; DEEP;

D O I：

暂无

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Deep learning based approaches have achieved promising performance in speaker-dependent single-channel multi-speaker speech separation. However, partly due to the label permutation problem, they may encounter difficulties in speaker-independent conditions. Recent methods address this problem by some assignment operations. Different from them, we propose a novel source-aware context network, which explicitly inputs speech sources as well as mixture signal. By exploiting the temporal dependency and continuity of the same source signal, the permutation order of outputs can be easily determined without any additional post-processing. Furthermore, a Multi-time-step Prediction Training strategy is proposed to address the mismatch between training and inference stages. Experimental results on benchmark WSJ0-2mix dataset revealed that our network achieved comparable or better results than state-of-the-art methods in both closed-set and open-set conditions, in terms of Signal-to-Distortion Ratio (SDR) improvement.

引用

页码：681 / 685

页数：5

共 50 条

[1] Single-Channel Multi-Speaker Separation using Deep Clustering
Isik, Yusuf
Le Roux, Jonathan
Chen, Zhuo
Watanabe, Shinji
Hershey, John R.
[J]. 17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 545 - 549
[2] Candidate Speech Extraction from Multi-speaker Single-Channel Audio Interviews
Pandharipande, Meghna
Kopparapu, Sunil Kumar
[J]. SPEECH AND COMPUTER, SPECOM 2023, PT I, 2023, 14338 : 210 - 221
[3] Single Channel multi-speaker speech Separation based on quantized ratio mask and residual network
Shanfa Ke
Ruimin Hu
Xiaochen Wang
Tingzhao Wu
Gang Li
Zhongyuan Wang
[J]. Multimedia Tools and Applications, 2020, 79 : 32225 - 32241
[4] Single Channel multi-speaker speech Separation based on quantized ratio mask and residual network
Ke, Shanfa
Hu, Ruimin
Wang, Xiaochen
Wu, Tingzhao
Li, Gang
Wang, Zhongyuan
[J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2020, 79 (43-44) : 32225 - 32241
[5] SuperFormer: Enhanced Multi-Speaker Speech Separation Network Combining Channel and Spatial Adaptability
Jiang, Yanji
Qiu, Youli
Shen, Xueli
Sun, Chuan
Liu, Haitao
[J]. APPLIED SCIENCES-BASEL, 2022, 12 (15):
[6] JOINT SINGLE-CHANNEL SPEECH SEPARATION AND SPEAKER IDENTIFICATION
Mowlaee, P.
Saeidi, R.
Tan, Z. -H.
Christensen, M. G.
Franti, P.
Jensen, S. H.
[J]. 2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, : 4430 - 4433
[7] Single-speaker/multi-speaker co-channel speech classification
Rossignol, Stephane
Pietquini, Olivier
[J]. 11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4, 2010, : 2322 - 2325
[8] A unified network for multi-speaker speech recognition with multi-channel recordings
Liu, Conggui
Inoue, Nakamasa
Shinoda, Koichi
[J]. 2017 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC 2017), 2017, : 1304 - 1307
[9] A Joint Approach for Single-Channel Speaker Identification and Speech Separation
Mowlaee, Pejman
Saeidi, Rahim
Christensen, Mads Grsboll
Tan, Zheng-Hua
Kinnunen, Tomi
Franti, Pasi
Jensen, Soren Holdt
[J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2012, 20 (09): : 2586 - 2601
[10] A PITCH-AWARE APPROACH TO SINGLE-CHANNEL SPEECH SEPARATION
Wang, Ke
Soong, Frank
Xie, Lei
[J]. 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 296 - 300

← 1 2 3 4 5 →