Structured Sparsity Models for Reverberant Speech Separation

被引:42
|
作者
Asaei, Afsaneh [1 ,2 ]
Golbabaee, Mohammad [3 ]
Bourlard, Herve [1 ,2 ]
Cevher, Volkan [4 ]
机构
[1] Idiap Res Inst, CH-1920 Martigny, Switzerland
[2] Ecole Polytech Fed Lausanne, CH-1015 Lausanne, Switzerland
[3] Univ Paris 09, Appl Math Res Ctr CERE MADE, F-75016 Paris, France
[4] Ecole Polytech Fed Lausanne, Dept Elect Engn, CH-1015 Lausanne, Switzerland
关键词
Distant speech recognition; image model; multiparty reverberant recordings; room acoustic modeling; source separation; structured sparse recovery; BLIND SOURCE SEPARATION; DECOMPOSITION; FRAMEWORK; SIGNALS;
D O I
10.1109/TASLP.2013.2297012
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
We tackle the speech separation problem through modeling the acoustics of the reverberant chambers. Our approach exploits structured sparsity models to perform speech recovery and room acoustic modeling from recordings of concurrent unknown sources. The speakers are assumed to lie on a two-dimensional plane and the multipath channel is characterized using the image model. We propose an algorithm for room geometry estimation relying on localization of the early images of the speakers by sparse approximation of the spatial spectrum of the virtual sources in a free-space model. The images are then clustered exploiting the low-rank structure of the spectro-temporal components belonging to each source. This enables us to identify the early support of the room impulse response function and its unique map to the room geometry. To further tackle the ambiguity of the reflection ratios, we propose a novel formulation of the reverberation model and estimate the absorption coefficients through a convex optimization exploiting joint sparsity model formulated upon spatio-spectral sparsity of concurrent speech representation. The acoustic parameters are then incorporated for separating individual speech signals through either structured sparse recovery or inverse filtering the acoustic channels. The experiments conducted on real data recordings of spatially stationary sources demonstrate the effectiveness of the proposed approach for speech separation and recognition.
引用
收藏
页码:620 / 633
页数:14
相关论文
共 50 条
  • [1] Multi-party Speech Recovery Exploiting Structured Sparsity Models
    Asaei, Afsaneh
    Taghizadeh, Mohammad J.
    Bourlard, Herve
    Cevher, Volkan
    12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 192 - 195
  • [2] CONVOLUTIVE PREDICTION FOR REVERBERANT SPEECH SEPARATION
    Wang, Zhong-Qiu
    Wichern, Gordon
    Le Roux, Jonathan
    2021 IEEE WORKSHOP ON APPLICATIONS OF SIGNAL PROCESSING TO AUDIO AND ACOUSTICS (WASPAA), 2021, : 56 - 60
  • [3] Humanoid separation of speech sources in reverberant environments
    Schulz, Sylvia
    Herfet, Thorsten
    2008 3RD INTERNATIONAL SYMPOSIUM ON COMMUNICATIONS, CONTROL AND SIGNAL PROCESSING, VOLS 1-3, 2008, : 377 - 382
  • [4] Evaluating Source Separation Algorithms With Reverberant Speech
    Mandel, Michael I.
    Bressler, Scott
    Shinn-Cunningham, Barbara
    Ellis, Daniel P. W.
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2010, 18 (07): : 1872 - 1883
  • [5] A study on unsupervised monaural reverberant speech separation
    R. Hemavathi
    R. Kumaraswamy
    International Journal of Speech Technology, 2020, 23 : 451 - 457
  • [6] A study on unsupervised monaural reverberant speech separation
    Hemavathi, R.
    Kumaraswamy, R.
    INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2020, 23 (02) : 451 - 457
  • [7] Synchronizing Speech Mixtures in Speech Separation Problems under Reverberant Conditions
    Llerena, Cosme
    Gil-Pita, Roberto
    Alvarez, Lorena
    Rosa-Zurera, Manuel
    ARTIFICIAL INTELLIGENCE AND SOFT COMPUTING, PT I, 2013, 7894 : 568 - 579
  • [8] A Frequency Domain Method for Speech Separation in a Reverberant Room
    Mischie, Septimiu
    Simion, Georgiana
    2010 9TH INTERNATIONAL SYMPOSIUM ON ELECTRONICS AND TELECOMMUNICATIONS (ISETC), 2010, : 303 - 306
  • [9] A feature study for masking-based reverberant speech separation
    Delfarah, Masood
    Wang, DeLiang
    17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 555 - 559
  • [10] Semantic Positioning Via Structured Sparsity Models
    Destino, Giuseppe
    Macagnano, Davide
    2014 IEEE WORLD FORUM ON INTERNET OF THINGS (WF-IOT), 2014, : 106 - 110