Structured Sparsity Models for Reverberant Speech Separation

被引：42

作者：

Asaei, Afsaneh ^{[1
,2
]}

Golbabaee, Mohammad ^{[3
]}

Bourlard, Herve ^{[1
,2
]}

Cevher, Volkan ^{[4
]}

机构：

[1] Idiap Res Inst, CH-1920 Martigny, Switzerland

[2] Ecole Polytech Fed Lausanne, CH-1015 Lausanne, Switzerland

[3] Univ Paris 09, Appl Math Res Ctr CERE MADE, F-75016 Paris, France

[4] Ecole Polytech Fed Lausanne, Dept Elect Engn, CH-1015 Lausanne, Switzerland

来源：

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2014年 / 22卷 / 03期

关键词：

Distant speech recognition; image model; multiparty reverberant recordings; room acoustic modeling; source separation; structured sparse recovery; BLIND SOURCE SEPARATION; DECOMPOSITION; FRAMEWORK; SIGNALS;

D O I：

10.1109/TASLP.2013.2297012

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

We tackle the speech separation problem through modeling the acoustics of the reverberant chambers. Our approach exploits structured sparsity models to perform speech recovery and room acoustic modeling from recordings of concurrent unknown sources. The speakers are assumed to lie on a two-dimensional plane and the multipath channel is characterized using the image model. We propose an algorithm for room geometry estimation relying on localization of the early images of the speakers by sparse approximation of the spatial spectrum of the virtual sources in a free-space model. The images are then clustered exploiting the low-rank structure of the spectro-temporal components belonging to each source. This enables us to identify the early support of the room impulse response function and its unique map to the room geometry. To further tackle the ambiguity of the reflection ratios, we propose a novel formulation of the reverberation model and estimate the absorption coefficients through a convex optimization exploiting joint sparsity model formulated upon spatio-spectral sparsity of concurrent speech representation. The acoustic parameters are then incorporated for separating individual speech signals through either structured sparse recovery or inverse filtering the acoustic channels. The experiments conducted on real data recordings of spatially stationary sources demonstrate the effectiveness of the proposed approach for speech separation and recognition.

引用

页码：620 / 633

页数：14

共 50 条

[11] Separation of Reverberant Speech Based on Computational Auditory Scene Analysis
Li Hongyan
Cao Meng
Wang Yue
AUTOMATIC CONTROL AND COMPUTER SCIENCES, 2018, 52 (06) : 561 - 571
[12] Blind speech separation of moving spearers in real reverberant environments
Koutras, A
Dermatas, E
Kokkinakis, G
2000 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, PROCEEDINGS, VOLS I-VI, 2000, : 1133 - 1136
[13] Fast convergence speech source separation in reverberant acoustic environment
Zhao, YX
Hu, R
2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL III, PROCEEDINGS: IMAGE AND MULTIDIMENSIONAL SIGNAL PROCESSING SPECIAL SESSIONS, 2004, : 897 - 900
[14] Deep Learning Based Binaural Speech Separation in Reverberant Environments
Zhang, Xueliang
Wang, DeLiang
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2017, 25 (05) : 1075 - 1084
[15] Binaural reverberant Speech separation based on deep neural networks
Zhang, Xueliang
Wang, DeLiang
18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 2018 - 2022
[16] SESNet: A Speech Enhancement and Separation Network in Noisy Reverberant Environments
Wang, Liusong
Gao, Yuan
Cao, Kaimin
Hu, Ying
MAN-MACHINE SPEECH COMMUNICATION, NCMMSC 2024, 2025, 2312 : 44 - 54
[17] Generalized linear models with structured sparsity estimators
Caner, Mehmet
JOURNAL OF ECONOMETRICS, 2023, 236 (02)
[18] WHAMR!: NOISY AND REVERBERANT SINGLE-CHANNEL SPEECH SEPARATION
Maciejewski, Matthew
Wichern, Gordon
McQuinn, Emmett
Le Roux, Jonathan
2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 696 - 700
[19] A multiresolution approach to blind separation of speech signals in a reverberant environment
Ikram, MZ
Morgan, DR
2001 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-VI, PROCEEDINGS: VOL I: SPEECH PROCESSING 1; VOL II: SPEECH PROCESSING 2 IND TECHNOL TRACK DESIGN & IMPLEMENTATION OF SIGNAL PROCESSING SYSTEMS NEURALNETWORKS FOR SIGNAL PROCESSING; VOL III: IMAGE & MULTIDIMENSIONAL SIGNAL PROCESSING MULTIMEDIA SIGNAL PROCESSING, 2001, : 2757 - 2760
[20] RECURRENT NEURAL NETWORKS FOR COCHANNEL SPEECH SEPARATION IN REVERBERANT ENVIRONMENTS
Delfarah, Masood
Wang, DeLiang
2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5404 - 5408

← 1 2 3 4 5 →