Structured Sparsity Models for Reverberant Speech Separation

被引:42
|
作者
Asaei, Afsaneh [1 ,2 ]
Golbabaee, Mohammad [3 ]
Bourlard, Herve [1 ,2 ]
Cevher, Volkan [4 ]
机构
[1] Idiap Res Inst, CH-1920 Martigny, Switzerland
[2] Ecole Polytech Fed Lausanne, CH-1015 Lausanne, Switzerland
[3] Univ Paris 09, Appl Math Res Ctr CERE MADE, F-75016 Paris, France
[4] Ecole Polytech Fed Lausanne, Dept Elect Engn, CH-1015 Lausanne, Switzerland
关键词
Distant speech recognition; image model; multiparty reverberant recordings; room acoustic modeling; source separation; structured sparse recovery; BLIND SOURCE SEPARATION; DECOMPOSITION; FRAMEWORK; SIGNALS;
D O I
10.1109/TASLP.2013.2297012
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
We tackle the speech separation problem through modeling the acoustics of the reverberant chambers. Our approach exploits structured sparsity models to perform speech recovery and room acoustic modeling from recordings of concurrent unknown sources. The speakers are assumed to lie on a two-dimensional plane and the multipath channel is characterized using the image model. We propose an algorithm for room geometry estimation relying on localization of the early images of the speakers by sparse approximation of the spatial spectrum of the virtual sources in a free-space model. The images are then clustered exploiting the low-rank structure of the spectro-temporal components belonging to each source. This enables us to identify the early support of the room impulse response function and its unique map to the room geometry. To further tackle the ambiguity of the reflection ratios, we propose a novel formulation of the reverberation model and estimate the absorption coefficients through a convex optimization exploiting joint sparsity model formulated upon spatio-spectral sparsity of concurrent speech representation. The acoustic parameters are then incorporated for separating individual speech signals through either structured sparse recovery or inverse filtering the acoustic channels. The experiments conducted on real data recordings of spatially stationary sources demonstrate the effectiveness of the proposed approach for speech separation and recognition.
引用
收藏
页码:620 / 633
页数:14
相关论文
共 50 条
  • [21] IMPROVING REVERBERANT SPEECH SEPARATION WITH SYNTHETIC ROOM IMPULSE RESPONSES
    Aralikatti, Rohith
    Ratnarajah, Anton
    Tang, Zhenyu
    Manocha, Dinesh
    2021 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2021, : 900 - 906
  • [22] Blind separation of speech with a switched sparsity and temporal criteria
    Smith, Daniel
    Burnett, Ian
    2006 IEEE WORKSHOP ON MULTIMEDIA SIGNAL PROCESSING, 2006, : 136 - +
  • [23] Multi-branch Learning for Noisy and Reverberant Monaural Speech Separation
    Ma, Chao
    Li, Dongmei
    PROCEEDINGS OF 2022 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2022, : 1247 - 1251
  • [24] A Performance Evaluation of Several Deep Neural Networks for Reverberant Speech Separation
    Liu, Qingju
    Wang, Wenwu
    Jackson, Philip J. B.
    Safavi, Saeid
    2018 CONFERENCE RECORD OF 52ND ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS, AND COMPUTERS, 2018, : 689 - 693
  • [25] Features for Masking-Based Monaural Speech Separation in Reverberant Conditions
    Delfarah, Masood
    Wang, DeLiang
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2017, 25 (05) : 1085 - 1094
  • [26] Exploring permutation inconsistency in blind separation of speech signals in a reverberant environment
    Ikram, MZ
    Morgan, DR
    2000 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, PROCEEDINGS, VOLS I-VI, 2000, : 1041 - 1044
  • [27] Interference Reduction in Reverberant Speech Separation With Visual Voice Activity Detection
    Liu, Qingju
    Aubrey, Andrew J.
    Wang, Wenwu
    IEEE TRANSACTIONS ON MULTIMEDIA, 2014, 16 (06) : 1610 - 1623
  • [28] GLMSNET: SINGLE CHANNEL SPEECH SEPARATION FRAMEWORK IN NOISY AND REVERBERANT ENVIRONMENTS
    Shi, Huiyu
    Chen, Xi
    Kong, Tianlong
    Yin, Shouyi
    Ouyang, Peng
    2021 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2021, : 663 - 670
  • [29] LOCALIZATION AND BEARING ESTIMATION VIA STRUCTURED SPARSITY MODELS
    Duarte, Marco E.
    2012 IEEE STATISTICAL SIGNAL PROCESSING WORKSHOP (SSP), 2012, : 333 - 336
  • [30] Structured Sparsity
    van de Geer, Sara
    ESTIMATION AND TESTING UNDER SPARSITY: ECOLE D'ETE DE PROBABILITES DE SAINT-FLOUR XLV - 2015, 2016, 2159 : 75 - 101