Structured Sparsity Models for Reverberant Speech Separation

被引:42
|
作者
Asaei, Afsaneh [1 ,2 ]
Golbabaee, Mohammad [3 ]
Bourlard, Herve [1 ,2 ]
Cevher, Volkan [4 ]
机构
[1] Idiap Res Inst, CH-1920 Martigny, Switzerland
[2] Ecole Polytech Fed Lausanne, CH-1015 Lausanne, Switzerland
[3] Univ Paris 09, Appl Math Res Ctr CERE MADE, F-75016 Paris, France
[4] Ecole Polytech Fed Lausanne, Dept Elect Engn, CH-1015 Lausanne, Switzerland
关键词
Distant speech recognition; image model; multiparty reverberant recordings; room acoustic modeling; source separation; structured sparse recovery; BLIND SOURCE SEPARATION; DECOMPOSITION; FRAMEWORK; SIGNALS;
D O I
10.1109/TASLP.2013.2297012
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
We tackle the speech separation problem through modeling the acoustics of the reverberant chambers. Our approach exploits structured sparsity models to perform speech recovery and room acoustic modeling from recordings of concurrent unknown sources. The speakers are assumed to lie on a two-dimensional plane and the multipath channel is characterized using the image model. We propose an algorithm for room geometry estimation relying on localization of the early images of the speakers by sparse approximation of the spatial spectrum of the virtual sources in a free-space model. The images are then clustered exploiting the low-rank structure of the spectro-temporal components belonging to each source. This enables us to identify the early support of the room impulse response function and its unique map to the room geometry. To further tackle the ambiguity of the reflection ratios, we propose a novel formulation of the reverberation model and estimate the absorption coefficients through a convex optimization exploiting joint sparsity model formulated upon spatio-spectral sparsity of concurrent speech representation. The acoustic parameters are then incorporated for separating individual speech signals through either structured sparse recovery or inverse filtering the acoustic channels. The experiments conducted on real data recordings of spatially stationary sources demonstrate the effectiveness of the proposed approach for speech separation and recognition.
引用
收藏
页码:620 / 633
页数:14
相关论文
共 50 条
  • [31] A Comparison of Computational Precedence Models for Source Separation in Reverberant Environments
    Hummersone, Christopher
    Mason, Russell
    Brookes, Tim
    JOURNAL OF THE AUDIO ENGINEERING SOCIETY, 2013, 61 (7-8): : 508 - 520
  • [32] A DUET-Based Method for Blind Separation of Speech Signals in Reverberant Environments
    Kim, Minook
    Lee, Tae-Jun
    Park, Hyung-Min
    IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES, 2015, E98A (11) : 2325 - 2329
  • [33] A Blind Source Separation Based Approach for Speech Enhancement in Noisy and Reverberant Environment
    Pignotti, Alessio
    Marcozzi, Daniele
    Cifani, Simone
    Squartini, Stefano
    Piazza, Francesco
    CROSS-MODAL ANALYSIS OF SPEECH, GESTURES, GAZE AND FACIAL EXPRESSIONS, 2009, 5641 : 356 - 367
  • [34] Blind Speech Separation and Recognition System for Human Robot Interaction in Reverberant Environment
    Cho, Janghoon
    Park, Hyunsin
    Yoo, Chang D.
    2012 9TH INTERNATIONAL CONFERENCE ON UBIQUITOUS ROBOTS AND AMBIENT INTELLIGENCE (URAL), 2012, : 584 - 585
  • [35] Separation of Multiple Speech Sources in Reverberant Environments Based on Sparse Component Enhancement
    Li, Lu
    Jia, Maoshen
    Liu, Jinxiang
    Pai, Tun-Wen
    CIRCUITS SYSTEMS AND SIGNAL PROCESSING, 2023, 42 (10) : 6001 - 6028
  • [36] AMBISEP: AMBISONIC-TO-AMBISONIC REVERBERANT SPEECH SEPARATION USING TRANSFORMER NETWORKS
    Herzog, Adrian
    Chetupalli, Srikanth Raj
    Habets, Emanuel A. P.
    2022 INTERNATIONAL WORKSHOP ON ACOUSTIC SIGNAL ENHANCEMENT (IWAENC 2022), 2022,
  • [37] Separation of Multiple Speech Sources in Reverberant Environments Based on Sparse Component Enhancement
    Lu Li
    Maoshen Jia
    Jinxiang Liu
    Tun-Wen Pai
    Circuits, Systems, and Signal Processing, 2023, 42 : 6001 - 6028
  • [38] Convolutive Prediction for Monaural Speech Dereverberation and Noisy-Reverberant Speaker Separation
    Wang, Zhong-Qiu
    Wichern, Gordon
    Le Roux, Jonathan
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2021, 29 : 3476 - 3490
  • [39] Structured Discriminative Models for Speech Recognition
    Gales, Mark
    Watanabe, Shinji
    Fosler-Lussier, Eric
    IEEE SIGNAL PROCESSING MAGAZINE, 2012, 29 (06) : 70 - 81
  • [40] Structured Discriminative Models for Speech Recognition
    Gales, Mark
    2012 8TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING, 2012, : XXII - XXII