Blind Separation and Dereverberation of Speech Mixtures by Joint Optimization

被引：105

作者：

Yoshioka, Takuya ^{[1
]}

Nakatani, Tomohiro ^{[1
]}

Miyoshi, Masato ^{[2
]}

Okuno, Hiroshi G. ^{[3
]}

机构：

[1] NTT Corp, NTT Commun Sci Labs, Kyoto 6190237, Japan

[2] Kanazawa Univ, Grad Sch Nat Sci & Technol, Kanazawa, Ishikawa 9201192, Japan

[3] Kyoto Univ, Grad Sch Informat, Dept Intelligence Sci & Technol, Kyoto 6068501, Japan

来源：

IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2011年 / 19卷 / 01期

关键词：

Blind source separation (BSS); blind dereverberation (BD); conditional separation and dereverberation (CSD); CONVOLUTIVE MIXTURES; IDENTIFICATION; DECONVOLUTION; SUPPRESSION; ALGORITHMS; SIGNALS;

D O I：

10.1109/TASL.2010.2045183

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

This paper proposes a method for performing blind source separation (BSS) and blind dereverberation (BD) at the same time for speech mixtures. In most previous studies, BSS and BD have been investigated separately. The separation performance of conventional BSS methods deteriorates as the reverberation time increases while many existing BD methods rely on the assumption that there is only one sound source in a room. Therefore, it has been difficult to perform both BSS and BD when the reverberation time is long. The proposed method uses a network, in which dereverberation and separation networks are connected in tandem, to estimate source signals. The parameters for the dereverberation network (prediction matrices) and those for the separation network (separation matrices) are jointly optimized. This enables a BD process to take a BSS process into account. The prediction and separation matrices are alternately optimized with each depending on the other; hence, we call the proposed method the conditional separation and dereverberation (CSD) method. Comprehensive evaluation results are reported, where all the speech materials contained in the complete test set of the TIMIT corpus are used. The CSD method improves the signal-to-interference ratio by an average of about 4 dB over the conventional frequency-domain BSS approach for reverberation times of 0.3 and 0.5 s. The direct-to-reverberation ratio is also improved by about 10 dB.

引用

下载

页码：69 / 84

页数：16

共 50 条

[31] Blind speech separation of nonlinear convolutive mixtures for robust speech recognition
Koutras, A.
Dermatas, E.
Kokkinakis, G.
Control and Intelligent Systems, 2002, 30 (02) : 83 - 90
[32] CNN-QTLBO: an optimal blind source separation and blind dereverberation scheme using lightweight CNN-QTLBO and PCDP-LDA for speech mixtures
Jasmine J. C. Sheeja
B. Sankaragomathi
Signal, Image and Video Processing, 2022, 16 : 1323 - 1331
[33] Delay and predict equalization for blind speech dereverberation
Triki, Mahdi
Slock, Dirk T. M.
2006 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-13, 2006, : 4955 - 4958
[34] SpatialNet: Extensively Learning Spatial Information for Multichannel Joint Speech Separation, Denoising and Dereverberation
Quan, Changsheng
Li, Xiaofei
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 1310 - 1323
[35] Joint Noise Suppression and Dereverberation of separating speech signals by using prediction and separation matrix
Palagan, C. Anna
PROCEEDINGS OF THE 3RD INTERNATIONAL CONFERENCE ON COMMUNICATION AND ELECTRONICS SYSTEMS (ICCES 2018), 2018, : 202 - 207
[36] AUTOREGRESSIVE FAST MULTICHANNEL NONNEGATIVE MATRIX FACTORIZATION FOR JOINT BLIND SOURCE SEPARATION AND DEREVERBERATION
Sekiguchi, Kouhei
Bando, Yoshiaki
Nugraha, Aditya Arie
Fontaine, Mathieu
Yoshii, Kazuyoshi
2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 511 - 515
[37] SEMI-BLIND SPEECH ENHANCEMENT BASED ON RECURRENT NEURAL NETWORK FOR SOURCE SEPARATION AND DEREVERBERATION
Wake, Masaya
Bando, Yoshiaki
Mimura, Masato
Itoyama, Katsutoshi
Yoshii, Kazuyoshi
Kawahara, Tatsuya
2017 IEEE 27TH INTERNATIONAL WORKSHOP ON MACHINE LEARNING FOR SIGNAL PROCESSING, 2017,
[38] Implementation and Assessment of Joint Source Separation and Dereverberation
Moffat, David
Reiss, Joshua D.
60TH AES INTERNATIONAL CONFERENCE ON DREAMS (DEREVERBERATION AND REVERBERATION OF AUDIO, MUSIC, AND SPEECH), 2016,
[39] Convolutive blind separation of speech mixtures using the natural gradient
Douglas, SC
Sun, XA
SPEECH COMMUNICATION, 2003, 39 (1-2) : 65 - 78
[40] Subband-based blind separation for convolutive mixtures of speech
Araki, S
Makino, S
Aichner, R
Nishikawa, T
Saruwatari, H
IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES, 2005, E88A (12) : 3593 - 3603

← 1 2 3 4 5 →